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A seismic shift 


After 25 years of divisive debate, the governments of the world unite in Paris to fight global 


warming. But the hard work must start now. 


into existence a landmark agreement on global warming, and 

without a single word of discussion. The small green gavel 
produced only a soft crack at the United Nations climate summit in 
Paris, a sound quickly overwhelmed by a standing ovation. But that 
sound should echo. It ushered in a seismic shift in international envi- 
ronmental and economic policy. If everything goes according to plan, 
the reverberations will be felt around the world for decades — and 
perhaps centuries — to come. 

The Paris agreement strengthens the previous goal of limiting 
warming to 2°C above pre-industrial levels, ultimately suggesting 
that governments should “pursue efforts to limit the temperature 
increase to 1.5°C”. Pushed by a coalition of island nations and some 
of the most vulnerable countries on Earth, this change offers a nod 
to scientific research, which suggests that even the 1°C of warming 
experienced thus far is already having effects. Current commitments 
to reduce emissions might put the world on a path to keep the rise in 
temperature below 3°C, and even that assumes substantial action in 
the decades to come. But all countries must revisit — and hopefully 
strengthen — their pledges every five years, beginning in 2020. 

Despite the contradiction between commitments and goals, the 
Paris accord is a vast improvement over the last binding agreement to 
curb emissions. The 1997 Kyoto Protocol explicitly divided the world 
into two factions, rich and poor, and it required only rich nations to 
reduce their emissions. In so doing, it tried to address legitimate ques- 
tions about equity and fairness. Poor nations argued — justifiably — 
that wealthy countries have profited immensely from fossil fuels, and 
that they were responsible for the bulk of historical greenhouse-gas 
emissions. They asserted their right to focus on lifting people out of 
poverty, while wealthy countries concentrated on bringing emissions 
down and developing technologies to enable everybody else to follow. 
It was a reasonable proposition — but it was destined to fail. 

Emissions have continued to rise. Although most of the past 
emissions have come from wealthy nations, the bulk of those in the 
future will come from developing countries. Scientists have made it 
abundantly clear that every country must do everything that it can, 
and as fast as it can, if the world is to prevent the worst consequences 
of global warming. 

The Paris agreement seeks to bridge the divide with carrots rather 
than sticks. Although countries agreed to engage in this new process, 
any action that they take to reduce emissions is on a purely voluntary 
basis. Indeed, the final change to the agreement in Paris, which took 
place quietly just minutes before the text was adopted, was to replace 
a ‘shall with a ‘should’ in a line stating how developed countries will 
commit to reducing emissions. This shift towards a voluntary frame- 
work based on national commitments was a necessary first step to 
bring everybody on board — and it worked. 

Things may yet unravel. When negotiations pick up next year, the 


() n 12 December, French foreign minister Laurent Fabius passed 


first task will be to spell out exactly what information countries need to 
submit regarding their emissions and commitments, and how the review 
process will work. Given that there are no penalties for failing to achieve 
acommitment, the foundation of this agreement is transparency. 
Governments, scientists and advocacy groups need solid informa- 
tion to verify that everybody is living up to their commitments and to 
transfer knowledge about what works and what doesn't. The last — 
and often overlooked — piece of this puz- 


“The Paris zle is that developing countries will need 
agreement help to establish the academic and technical 
represents a bet expertise needed to meet these new inter- 
ontechnological national standards. 

innovation and The Paris agreement represents a bet on 


humaningenuity.” — technological innovation and human inge- 
nuity. If governments follow through, com- 
panies and investors will shift resources towards clean energy to secure 
a place in an economy that will look very different several decades on. 

In many ways, the debate about the long-term temperature-rise goal 
is symbolic. In the end, as noted in the agreement itself, the world 
needs to reduce net greenhouse-gas emissions to zero — and to do 
that, all countries must seek to halt the rise and bring down their emis- 
sions as soon as possible. Everybody has a role in making that happen. 
But today, the world can celebrate a win for global diplomacy. = 


Crop conundrum 


The EU should decide definitively whether 
gene-edited plants are covered by GM laws. 


hen philosopher George Santayana said more than a 
Ween ago that those who do not learn from history are 

doomed to repeat it, he could have been predicting the 
European Union and its approach to genetically modified (GM) 
organisms. 

As we report in a News story on page 319, the EU is dragging its feet 
over a legal ruling that could affect research and innovation for years 
to come. At stake is the use of gene-editing tools such as CRISPR- 
Cas9, which are revolutionizing biology. These techniques should 
theoretically trigger few safety alarms, yet they may be snared by the 
onerous legislation that has already added layers of bureaucracy to 
research involving conventional genetic engineering, and has slowed 
the cultivation of GM crops almost to a standstill in many nations. 

The new tools can be applied to create mutations that could have 
occurred naturally, and leave no trace of foreign genes in the product. 
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Accordingly, the US Department of Agriculture has ruled in several 
cases that the products do not have to be regulated as GM organisms. 

The European Commission is yet to send the same signal. In 
fact, it could decide that such products are governed by the existing 
cumbersome rules — its 2001 directive on the deliberate release of GM 
organisms into the environment. That would be a disaster for research. 

The commission represents the interests of 28 member states, which 
are deeply divided on issues of genetic modification. But it needs to 
make clear — soon and with no room for misinterpretation — that 
work with these new techniques is important and does not necessarily 
need to be regulated in the same way as the previous generation of 
GM crops. 

The precise and efficient gene-editing tools insert a gene that can 
create tiny, targeted mutations in an organism’s own genome. These 
mutations can permanently change the function ofa host gene, change 
its sensitivity to environmental cues or switch it off entirely; the foreign 
gene can then be bred out. 

The core legal issue is whether the 2001 directive applies to all 
products of genetic engineering, or only to organisms that have been 
altered in a way that could not occur naturally. Clauses in the directive 
mention both. 

Non-governmental organizations that are hostile to genetic engi- 
neering say that the directive is about the process by which products 
are created. But legal analyses conducted in the past year by several 
member states — including Germany, which has been opposed to 
conventional GM crops — concluded that it is fundamentally about 
the products themselves. 

The commission’s own legal analysis, being handled behind firmly 
closed doors, is the one that will count. But the result has been 
repeatedly delayed, spreading immense uncertainty in the scientific 
community. 


It is now promised before the end of March. Why is it taking so long? 
The commission has strongly hinted that the matter will ultimately 
be settled in court. Its decision, when it comes, is bound to annoy par- 
ties on one side, which may then sue. The possibility that a decision 
that releases many gene-edited products from GM regulation could 

be overturned by a court will add to the community’s uncertainty. 
There is some history here, and it should not be repeated. The com- 
mission tried, and failed, to resolve the lengthy disagreement over con- 
ventional GM crops by getting the European 


“Letting acourt — Court of Justice to rule on whether member 
decide apolitical _ states shouldbe required to allow cultivation 
issue is a poor of such crops deemed safe by EU regulatory 
option. me authorities. The court ruled that they should, 


but some countries banned it anyway. Ina 
messy compromise, the EU now allows individual states to opt out. 

The commission may be calculating that the reaction to a court 
ruling could be different this time, as a result of member states signal- 
ling their willingness to consider gene-edited products to be non-GM. 

But letting a court decide a political issue is a poor option. It could 
take years. Even a positive verdict could rebound by reinforcing the 
narrative in some countries that the technology is being forced upon 
them. And it does not convey a positive message about legislation, 
which is supposed to reflect the will of the people. 

The commission should indicate that the spirit of the 2001 directive 
does not cover the impact of the new gene-editing tools, and should 
give them an appropriate green light — with encouraging enthusiasm. 
If the exact wording of the 2001 directive gives room for doubt, then 
it should be updated to reflect a world in which new science has long 
overtaken the old. 

Whatever the decision, the uncertainty must be lifted to allow 
research to proceed, and quickly. m 


Science for peace 


The German research community can benefit 
from the influx of migrants. 


his year’s refugee crisis — a result of the civil war in Syria and 
enduring instability in the Middle East and Africa — has 
become an acid test for the European Union. 

Although some countries would rather pull up the drawbridge 
where refugees are concerned, Germany has generously welcomed 
nearly one million migrants this year, without regard for the costs or 
logistical burden involved. “We can do it!” Chancellor Angela Merkel 
never failed to remind German citizens. 

However, as police, immigration authorities, communities and 
volunteers creak under the strain, Merkel’s optimism is increasingly 
being denounced in some quarters. To integrate hundreds of thou- 
sands of traumatized, mostly Muslim, war refugees into Western 
society is a massive social challenge. But, contrary to what some crit- 
ics seem to assume, early signs show that the young refugees — and 
under-25s make up around half of the influx — will not be inclined to 
accept social welfare and sit back idly for long. Robbed of their hopes 
and dreams at home, many will grasp the opportunities offered. 

And many will be eager to learn. If admitted into Germany's well- 
oiled education and science system (and into its booming labour 
market at large), they can be a boon rather than a burden to the 
country’s knowledge-based economy. 

German universities and science organizations are aware of the 
responsibility to these displaced people and the opportunity they 
represent. The messages they send in favour of openness and plural- 
ity — defining features of any honest science — are laudable at a time 
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when xenophobia is on the rise elsewhere. 

Thanks to several programmes and initiatives launched by the 
German science community in recent months, refugee students can 
access university education and doctoral-research opportunities, and 
qualified refugee scientists and scholars can participate in advanced 
science at research institutes across Germany (see page 320). These 
initiatives are much-needed and deserve every respect. 

Refugees are expected to continue to arrive in Europe in large 
numbers, often lacking documentation of their professional or 
academic qualifications. Opportunities must continue to be avail- 
able to them, and more must be helped to connect with potential 
employers, in and outside of academia. 

Online tools such as the European Commissions Science4Refugees 
portal, on which employers can post job opportunities and refugees 
seeking science jobs can put their CVs, are well meant but not (yet) 
frequently used. Learned academies, universities and science organi- 
zations throughout Europe should more clearly and proactively pro- 
mote the message that students, scholars and scientists who have been 
forced to flee their home can rebuild their careers as well as their lives. 

Social researchers who study education, mobility and integra- 
tion — for whom the current wave of migration is a research oppor- 
tunity — must strive to empirically challenge presumptions about 
refugees’ allegedly low level of qualification and susceptibility to politi- 
cal or religious extremism. To be sure, these things need to be — and 
will be — thoroughly investigated. But the idea touted by some that 
Muslim values are a fundamental obstacle to successful integration 
into a modern secular society is wrong and hopelessly short-sighted. 

Whatever critics might say, Germany’s rebirth as a haven for the 
prosecuted is a powerful gesture of peace. 
Embracing refugees, while assuring anxious 
citizens that openness need not threaten their 
own quality of life, is perhaps the most pressing 
social challenge faced by science in these times. m 


2 NATURE.COM 

To comment online, 
click on Editorials at: 
go.nature.com/xhunqv 


© 2015 Macmillan Publishers Limited. All rights reserved 


ANDREA ARMSTRONG 


WORLD VIEW  pennisicos son 


to be the most debilitating aspect of a mental illness. It is 

easy to see why. Stigma increases mental distress and leads to 
shame, avoidance of treatment, social isolation, and, consequently, a 
deterioration in health. 

What form does this stigma take? Is it decreasing for mental ill- 
nesses such as depression, as claimed by some media articles? How 
can it be combated? We don't know the answers to those questions. 
That is partly because not enough people have asked them — and 
partly because not enough people have answered them. Surveys 
are expensive, and funds, especially for research on mental illness, 
are limited. 

Surveys in the old days saw pollsters with hand-held clipboards 
quizzing shoppers in department stores. This 
gave way to the ubiquitous telephone sur- 
vey. Today, the Internet affords ever more 
ways to collect survey data. Some years ago, 
I developed a way to ask questions in an effi- 
cient and global manner. It is called Random 
Domain Intercept Technology and it relies on 
people — like you — making mistakes while 
browsing the Internet. Mistyped URLs and bro- 
ken web links trigger the survey, and invite the 
user to participate. 

Unlike surveys in which people are given cash 
or rewards to answer questions, this method 
does not allow for a long-form questionnaire, 
although it can break down long surveys into 
shorter mini-surveys. It permits brief ques- 
tions — often 8 to 15 of them — to be asked, and 
answered ona voluntary, non-incentivized basis 
by large numbers of random and anonymous people using the Inter- 
net. And that means almost everywhere in the world. 

From September 2013 until May this year, we used the technol- 
ogy to ask some simple questions about mental illness and stigma. 
More than 1 million people from 229 territories responded. Their 
responses offer a unique and real-time snapshot of how the globe 
thinks about the estimated one-quarter of its population who will 
experience mental ill health (N. Seeman et al. J. Affect. Disord. 190, 
115-121; 2016). 

The survey requested age and gender, and then asked two specific 
questions. First, is there someone you interact with every day who 
suffers from mental illness? (This may include psychosis, depres- 
sion or addiction.) And second, are people who suffer from mental 
illness any of the following: more lazy, more 


r Vhe US National Institute of Mental Health considers stigma 


violent, suffering from a condition as serious NATURE.COM 
as physical illness, the victims of bad parenting, _ Discuss this article 
or able to overcome their challenges through _ onlineat: 


‘tough love’? go.nature.com/Isux4f 
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ANONYMITY 
OF THE 


SURVEY 


FACILITATED 


CONSISTENT 
ANSWERS. 


Use data to challenge 
mental-health stigma 


Web surveys of attitudes towards mental illness reveal the size of 
the problem — and offer a way to find fixes, says Neil Seeman. 


In developed countries, only 7% of respondents thought that people 
with mental illness were more violent than the general population. 
In remarkable contrast, about 15% of those in developing countries 
thought that people with mental illness were more violent. Although 
45-51% of respondents from developed countries believed that mental 
illness is similar to physical illness, only 7% of the same people thought 
that mental illness can be overcome. It seems that the understanding 
that mental illness has a biological cause makes the public more, rather 
than less, pessimistic about outcome. This has been reported previ- 
ously, and is, at first glance, counterintuitive. Attributing illness to genes 
takes away blame, but at the same time, takes away hope for change. 

Although the identity of individual respondents is unknown, the 
overall reproducibility of responses from any one region is high. When 
the same questions were posed every month in 
India for 21 months running, 10% of respond- 
ents each time reported that people with mental 
illness are more violent than others. 

And despite the fact that mental illness is 
often a taboo subject, the anonymity of the sur- 
vey facilitated consistent answers. In China, for 
example, people with mental illness are often 
viewed as bringing shame on their family. The 
‘loss of face’ associated with mental illness there 
and in many developing countries attaches not 
only to the ill person, but also to family mem- 
bers. In this context it makes sense, therefore, 
that people with mental illness are kept at home, 
and this may explain the high proportion of peo- 
ple in China who reported having daily contact 
with a mentally ill person. 

The approach I describe can uncover views on 
any topic held by those in Internet-enabled areas, currently 43% of 
the planet. And it can allow for ‘before and after’ surveys, assessing 
the effectiveness of population-wide interventions. 

For instance, it would be of immense value to repeat this stigma 
survey in a region that has introduced a public-education anti- 
stigma campaign. The tool is not limited to stigma — in the field of 
mental health, for instance, it can probe suicidal ideas and, again, 
evaluate a suicide-prevention intervention. It can probe symptoms 
of post-traumatic stress disorder in the wake of a disaster (such as 
a hurricane or the Paris terrorist attacks) and test ways to mitigate 
these traumas. 

Measuring a social problem on the scale of mental-illness stigma 
does not make it go away. But at least it shows us the size of the chal- 
lenge — and could very well help to find ways to fix it. m 


Neil Seeman is chief executive of the RIWI Corporation and a senior 
fellow at Massey College, University of Toronto, Toronto, Canada. 
e-mail: neil@riwi.com 
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Selections from the 
scientific literature 


RESEARCH HIGHLIGHTS 


ASTROPHYSICS 


Cosmic boost 
reveals dim galaxy 


Astronomers have spied the 
faintest object ever seen in the 
early Universe. 

Leopoldo Infante at the 
Pontifical Catholic University 
of Chile in Santiago and his 
team used NASA's Hubble 
and Spitzer space telescopes 
to study distant objects. They 
examined sections of the sky 
through a dense cluster of 
galaxies, which bends and 
magnifies incoming light, and 
found 22 faint galaxies. The 
oldest one was observed as 
it was 13.4 billion years ago, 
around 400 million years after 
the Big Bang. 

The small, dim galaxy 
was named Tayna, meaning 
‘firstborm in the Native South 
American language Aymara. 
It may be more representative 
of the first galaxies than other 
distant, brighter examples, say 
the authors. 

Astrophys. J. 815, 18 (2015) 


Missed mutations 
in cancer genomes 


A comparison of cancer- 
genome sequences produced 
by 18 different research teams 
reveals that less than half of 
cancer-linked mutations were 
identified by all the groups. 
This suggests that differences 
in experimental procedures 
and analysis could reduce 
the accuracy of cancer- 
genome sequencing, which is 
increasingly used in the clinic. 
Ivo Gut at Spain's National 
Centre for Genomic Analysis 
in Barcelona, together with 
researchers in the International 
Cancer Genome Consortium, 
looked for genetic differences 
in cancerous and healthy tissue 
from the same person. They 
then compared these results 


ENVIRONMENTAL SCIENCE 


Ocean plastic piling up fast 


plankton nets — more information than in 
previous studies. By combining those data 
with sophisticated ocean-circulation models, 
they estimated that the oceans contain 
93,000-236,000 tonnes of microplastic particles. 
This represents just 1% of ocean plastic: the rest 
lies intact (pictured) on the sea floor or shore, or 
trapped in marine organisms, the authors suggest. 
Environ. Res. Lett. 10, 124006 (2015) 


Up to 240,000 tonnes of plastic particles are 
polluting the world’s oceans — at least three 
times more than previous estimates. 

Each year, 5 million to 13 million tonnes of 
plastic ends up in the sea, where it slowly degrades 
into microplastic particles that threaten marine 
ecosystems. Erik van Sebille at Imperial College 
London and his colleagues analysed 40 years of 
data on plastic collected from surface-trawling 


with a benchmark that used ten 
times more sequencing data 
than usual. Out of more than 
1,200 single-letter mutations, 
only 40% were identified by all 
18 teams. 

DNA preparation and other 
parameters can be optimized 
easily to improve sequencing 
accuracy, the authors say. 
Nature Commun. 6, 10001 (2015) 


Rising sea levels 
alter Earth’s spin 


Researchers have confirmed 
that rising sea levels caused 
by melting glaciers are 
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slowing Earth’s rotation. 

As ice melts, it redistributes 
mass across the planet’s 
surface, slightly changing the 
rate at which Earth spins. But 
a 2002 study could not explain 
the observed rotational 
changes on the basis of its 
assumptions about rising sea 
levels. Now Jerry Mitrovica 
of Harvard University in 
Cambridge, Massachusetts, 
and his colleagues say that they 
have resolved the problem. 
They used updated numbers 
for global sea-level rise, 
which are lower than those 
assumed in the 2002 study, 
and recalculated how the 
geographic poles have shifted 


over the past 3,000 years. 
The work improves 
scientists’ understanding 
of how Earth’s rotation has 
changed in the past, and 
how rising sea levels might 
continue to alter it in the 
future. 
Sci, Adv. 1, e1500679 (2015) 


ASTRONOMY 


Galaxies caught 
in cosmic web 


Astronomers have discovered 
eight massive young galaxies 
within what might be a large 
web of dark matter. 

Ordinary matter, including 
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galaxies, is thought to have 
aggregated along threads 
of dark matter in the early 
Universe. But the progenitors 
of today’s galaxies are often 
shrouded in clouds of 
dust, making it difficult for 
astronomers to spot them and 
test this theory. 

Hideki Umehata at 
the European Southern 
Observatory in Garching, 
Germany, and his colleagues 
used the high-resolution 
Atacama Large Millimeter/ 
submillimeter Array in 
Chile to make detailed 
observations of a narrow slice 
of the sky. They compared. 
their results with previous 
surveys of the region to find 
the galaxies, which were 
more than 3.4 billion parsecs 
(11 billion light years) away 
and producing hundreds of 
millions of new stars each year. 

The study supports the idea 
that big galaxies form in areas 
with a high concentration of 
dark matter. 
Astrophys. J. Lett. 815, L8 (2015) 


Possible pause in 
Arctic sea-ice loss 


An expected slowdown of 
large-scale heat circulation 
in the Atlantic Ocean could 
temporarily halt the decline of 
Arctic sea ice (pictured). 
Stephen Yeager at 
the National Center for 
Atmospheric Research in 
Boulder, Colorado, and his 
colleagues used an Earth- 
system model to analyse the 
causes of decadal trends in 
sea-ice extent in the North 
Atlantic. They found that 
the drastic retreat of sea ice 
since 1990 coincided with a 
strong Atlantic circulation 


that brought warm surface 
water from the tropics 
to high latitudes. If this 
circulation were to weaken, 
as observations suggest that it 
will, less heat arriving in the 
Arctic Ocean will probably 
lead to a pause in winter 
sea-ice loss over the next 5 to 
10 years, the authors conclude. 
They add, however, that the 
rate of sea-ice melting could 
jump back up afterwards as 
global warming continues. 
Geophys. Res. Lett. http://doi. 
org/9wz (2015) 


EVOLUTION 


How birds spread 
around the globe 


The common ancestor of 
all modern birds lived 
in South America some 
95 million years ago. 

Birds inhabit every 
continent, and are among 
the most diverse vertebrate 
groups on Earth. To chart 
birds’ rise and spread, Santiago 
Claramunt and Joel Cracraft 
at the American Museum of 
Natural History in New York 
created an evolutionary tree 
based on DNA sequences from 
230 bird species and fossil 
records for 130 extinct species. 

They found that bird 
diversity expanded rapidly 
after the demise of dinosaurs 
some 66 million years ago, 
dispersing along two primary 
routes. From South America, 
birds moved into North 
America, spread to Eurasia 
through the Arctic and then 
on to Africa. Birds arrived in 
Australia by way of Antarctica. 
Sci. Adv. 1,e1501005 (2015) 


MATERIALS 


Electrons dance in 
pulled graphene 


Stretching an atom-thick 
strip of carbon could mimic 
the effects of a magnetic field, 
changing the behaviour of 
electrons so that the effect is 
100 times stronger than that 
from normal magnets. 

Teng Liat the University 
of Maryland in College Park 
and his colleagues calculated 


RESEARCH HIGHLIGHTS MiiiSaiaa¢ 


Popular topics 


Deleting journal names triggers debate 


Michael Eisen has long argued that research papers should 
be judged on the basis of their content, not on which journal 
they were published in. On 6 December, Eisen — a biologist at 
the University of California, Berkeley, and co-founder of the 
open-access publisher PLOS — decided to prove his point. He 
revamped his laboratory's website and announced on Twitter: 
“Made a new lab website — completely scrubbed any mention 
of journal titles — http://www.eisenlab.org//publications. 
html” A few other scientists followed suit, and one even 
went a step further. Plant geneticist Jeffrey Ross-Ibarra at 
the University of California, Davis, tweeted: “Following @ 
mbeisen, removed journal names from website. But also links 
to cites, almetrics, [sic] & preprints. http://www.rilab.org/pubs. 
html” Others were sceptical. Manolis Dermitzakis, a geneticist 
at the University of Geneva, Switzerland, 


> NATURE.COM 
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how to engineer the large 
pseudomagnetic fields that 
are produced when graphene 
is pulled from two ends. 
This strains bonds between 
carbon atoms, causing 

their electrons to move ina 
way that is similar to what 
happens in a magnetic field. 
The team found that a small 
tug (of up to 15% stretch) on 
certain shapes of graphene 
strip could produce a strong, 
nearly uniform field. 

The designer shapes could 
help researchers to study the 
properties of graphene under 
extreme conditions — such as 
large magnetic fields — that 
are usually unattainable, the 
authors say. 

Phys. Rev. Lett. 115,245501 
(2015) 


Mini Fallopian 
tubes in adish 


Human Fallopian tubes 
contain adult stem cells that, 
when grown in the lab, can 
form miniature 3D structures 
resembling Fallopian tubes 
(pictured). 

Thomas Meyer at the Max 
Planck Institute for Infection 
Biology in Berlin and his 


posted: “I don't see the point. The paper 
is published in a journal so this is just 
artificial. Or publish your papers on your 
website only,’ 


colleagues isolated cells 
from human Fallopian-tube 
samples and grew them in 
3D cultures. Two weeks later, 
they saw mature ‘organoids’ 
that had folds in the tissue, 
hair-like structures called 
cilia, and secretory cells — all 
characteristics of the Fallopian 
tube. The organoids were 
stable for more than 16 months 
and sensitive to the hormones 
oestradiol and progesterone. 
The organoids could be 
used to study tube pathology 
and certain types of ovarian 
cancer that are thought to 
originate in the Fallopian 
tubes, the authors say. 
Nature Commun. http://doi. 
org/9wr (2015) 
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SEVEN DAYS nescnnss 


POLICY 


EU data-mining 

The European Commission 
confirmed on 9 December 
that it wants to propose 
legislation to exempt certain 
types of text and data mining 
from copyright laws. As part 
of wider copyright reform, 
public-interest research 
organizations would be 
allowed to mine text and 
data from journal articles for 
research purposes without 
having to ask permission 
from the copyright owner. 
Researchers worried about 
legal restrictions on the data 
mining have long campaigned 
for the change. 


Gain of function 
The US National Science 
Advisory Board for 
Biosecurity will convene 

on 7 January in Bethesda, 
Maryland, to assess the risks 
and benefits of ‘gain-of- 
function research — work 
intended to increase the 
virulence, transmissibility 
or host range of pathogens. 
The meeting will consider 
the findings of a 1,006-page 
risk—-benefit assessment 

by the Gryphon Scientific 
consultancy in Takoma 
Park, Maryland, published 
on 11 December. The 
United States introduced a 
moratorium in October last 


51 trillion 


The upper estimate on 

how many pieces of plastic 
smaller than 5 millimetres 
across had accumulated 

in the world’s oceans by 
2014. The lower estimate is 
15 trillion. 

Source: E. van Sebille et a/. Environ. 
Res. Lett. 10, 124006 (2015). 


Venus probe enters orbit 


Japan’s Akatsuki probe is circling Venus on an even-closer 
orbit than mission managers had hoped for, the Japan 
Aerospace Exploration Agency announced on 9 December. 
In 2010, Akatsuki missed its first chance to enter into orbit; it 
made a second, successful attempt this month. At its closest 
approach, the probe will fly just 400 kilometres above Venus’s 
surface, from which point researchers aim to study the 
planet’s atmosphere. Three of the craft’s five cameras have 
already been confirmed as functional after their extra five 
years in space. This image was taken by the ultraviolet imager 
from about 72,000 kilometres above Venus’ surface. 


year on federal funding of such 
research on the agents that 
cause influenza, severe acute 
respiratory syndrome (SARS) 
and Middle East respiratory 
syndrome (MERS). 


Paris deal done 


Negotiations at the Paris 
climate-change talks sealed 
a deal between 195 nations 
to limit warming to “well 
below” 2 °C above pre- 
industrial temperatures. 
The 32-page package was 
made on 12 December 
after 2 weeks of talks, and 
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commits most nations to 
significant reductions in 
carbon emissions. The 
agreement notes that 
vulnerable low-lying 
countries are set to face 
rising sea levels and stronger 
storms. See page 315 for 
more. 


Open-data accord 


Four international science 
lobby groups have launched 

a joint accord supporting 
open data as a tool for 
more-equitable science. 

The initiative, announced 

on 9 December in Pretoria 
during the first Science Forum 


South Africa, attempts to 
make it easier for developing 
countries to participate in 
research on a global level. 

It is also the first attempt 

to unify the fragmented 
activities of the four bodies, 
which represent different 
disciplines and global regions: 
the International Council for 
Science, the InterAcademy 
Partnership, the International 
Social Science Council 

and the World Academy 

of Sciences. 


DOE science chief 
Cherry Murray, a physicist 

at Harvard University in 
Cambridge, Massachusetts, 
will be the new director of the 
US Department of Energy 
(DOE) science office. The 

US Senate confirmed her 
appointment on 10 December. 
The decision is considered 
surprising because most recent 
federal appointments have 
been blocked by the Senate — 
the previous nominee for the 
office, Michael Kastner, was 
not confirmed after his 2013 
nomination. Murray, an expert 
in condensed-matter physics 
and photonics, will take office 
this month. 


PS FUNDING 
AIDS funding cut 


Ina readjustment of priorities 
announced on 11 December, 
the US National Institutes of 
Health (NIH) will no longer 
put 10% of its science budget 
towards AIDS research, 
overturning a requirement of 
more than 20 years. The policy 
has been controversial, with 
opponents arguing that the 
number of HIV/AIDS deaths 
dropped precipitously during 
this time. The NIH director’s 
advisory council said that, 

as existing grants end, the 
move will eventually free up 
hundreds of millions of dollars 


x 


x 


« for research on other diseases. 


SOURCE: US GEOLOGICAL SURVEY 


The agency will refocus its 
remaining AIDS budget 

away from basic biology and 
towards the creation of specific 
therapies and vaccines. 


Animal names safe 
Thanks to gifts totalling 
S$1.35 million (US$959,000), 
the International Commission 
on Zoological Nomenclature 
(ICZN) secretariat will be 

able to continue its role of 
ensuring that animal species 
are named in a systematic 
fashion. The commission had 
been facing insolvency. Based 
at the National University 

of Singapore, the ICZN 
enforces a globally accepted 
nomenclature code to ensure 
that each species has a unique 
and scientifically appropriate 
name; around 15,000 new 
species are described annually. 
The philanthropic Lee 
Foundation in Singapore 
provided nearly all of the 
endowment, the ICZN 
announced on 14 December in 
Berlin at a joint meeting with 
the International Union of 
Biological Sciences. 


FACILITIES 


Stellarator is go 


The world’s largest 
‘stellarator’ fusion 

device roared into life on 

10 December. The €1-billion 
(US$1.1-billion) Wendelstein 
7-X, based at the Max Planck 


TREND WATCH | 


The source of tantalum, a metal 
used in the electronics industry 
and for specialized mechanical 
parts, has shifted dramatically 
since 2000, according to a US 
Geological Survey report. In 2000, 
Australia was the world’s main 
source of tantalum (producing 
45%), but in 2014 Rwanda 
produced most (50%). Tantalum 
is a ‘conflict mineral, meaning 
that its sale may finance conflict in 
countries such as the Democratic 
Republic of the Congo, and buyers 
must check the metal’s source. See 
go.nature.com/wog3zu for more. 


Institute for Plasma Physics 
in Greifswald, Germany, 
produced its first plasma 
(pictured), lasting for 
one-tenth of a second and 
reaching a temperature 

of around 1 million °C. 
Although the test run used 
helium, next year the device 
will start superheating 
hydrogen in experiments 
designed to explore the 
suitability of the technique 
for commercial fusion. 

The stellarator confines 
ionized gas using intricately 
interwoven magnetic coils. 
The design is difficult to 
construct but potentially a 
more stable alternative to the 
doughnut-shaped ‘tokamak’ 
used by the international 
ITER fusion project, based in 
southern France. 


| BUSINESS 
NEON Inc. out 


The US National Science 
Foundation (NSF) has 
decided to replace the 
manager of the beleaguered 


TANTALUM SOURCES SHIFT 


US$434 million National 
Ecological Observatory 
Network, the company 
NEON, Inc. The decision 
comes after the company 
told the NSF in June that 

it was running $80 million 
over budget. That triggered 
a congressional hearing and 
warning from NSF that it 
might oust NEON, Inc. in 
favour of another operator. 
The construction of the 
remaining observatory sites 
will probably be overseen by 
another company. 


Chemicals combine 


Two of the world’s largest 
chemical and agricultural 
companies, Dow Chemical 
of Midland, Michigan, and 
DuPont of Wilmington, 
Delaware, will attempt to 
merge. On 11 December, 
the companies announced 
that subject to regulatory 
approval, they would 
combine forces to 

create a firm valued at 
US$130 billion. That would 
then break apart into three 
independent companies: 
one focused on agriculture, 
another on materials science 
and the third on speciality 
products. 


Dengue vaccine 


The first vaccine for 
preventing the tropical 
disease dengue fever has 
been approved for use 
in Mexico. The vaccine, 


The location of the biggest tantalum producers has 
changed significantly in the past 15 years. 
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SEVEN DAYS | THIS WEEK | 
COMINGUP | 


18-21 DECEMBER 
The European Society 
for Medical Oncology 
holds its Asia Congress 
in Singapore. 
go.nature.com/6vwgoh 


19-22 DECEMBER 
The International 
Liposome Society 
gathers its members 
at University College 
London to discuss 
the use of liposomes 
in drug and vaccine 
delivery. 
go.nature.com/jmj6uy 


Dengvaxia, developed by 
Sanofi Pasteur of Lyon, 
France, was approved on 

9 December by Mexico for 
patients aged 9 to 45 who 
live in areas where dengue is 
endemic. The viral infection 
is carried by mosquitos, and 
the number of infections 
worldwide has risen rapidly 
in recent years. The vaccine 
protects against the four 
variants of the dengue 

virus, and was approved 
after a clinical-development 
programme that involved 
more than 40,000 people in 
15 countries. 


Open intelligence 


A group of individuals and 
companies from Silicon 
Valley in California have 
formed a non-profit 
company to research 
artificial intelligence (AI) 
that is “likely to benefit 
humanity as a whole”. The 
company, OpenAI, has 
raised US$1 billion and is 
co-chaired by Elon Musk, 
chief executive of the 
electric-car company Tesla 
Motors and private space- 
flight firm SpaceX. Musk 
has previously urged caution 
when it comes to AI, warning 
that it could become “more 
dangerous than nukes”. 


> NATURE.COM 
For daily news updates see: 
www.nature.com/news 
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NEWSIN FOCUS 


Roots of malignancy 
drive debate over role of ‘bad 
luck’ in disease p.317 


Korea agree to territorial 
talks p.318 


China and South 


scientists help tap into 
immigrants’ potential p.320 


The 
“ scientific myths that 
= evidencecan’t kill p.322 


French foreign minister Laurent Fabius, chairman of the Paris talks, gives the new climate accord two thumbs up. 


Nations adopt historic 
global climate accord 


Agreement commits world to holding warming ‘well below’ 2 °C. 


BY JEFF TOLLEFSON & KENNETH R. WEISS, 
PARIS 


hen the gavel came down for the 
final time at the climate summit in 
Paris on 12 December, representa- 


tives from 195 countries erupted into cheers. 
They had approved a landmark plan to 
combat climate change after two weeks of 
gruelling negotiations. The agreement 
commits most countries to reduce their 


greenhouse-gas emissions, while seeking to 
protect low-lying islands from rising seas and 
helping poor nations to develop their econo- 
mies without relying on cheap, dirty fossil fuels. 

The accord, years in the making, seeks to 
hold warming “well below” 2°C above pre- 
industrial temperatures. Countries’ current 
climate pledges fall short of that goal, but many 
scientists and governments see the Paris agree- 
mentas the last, best hope to set the planet on 
a course to avoid catastrophic climate change. 
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“History is written by those who commit, 
not those who calculate,” French president 
Francois Hollande told negotiators after 
the accord was adopted. “Today you have 
committed.” 

The ambitious 32-page package contains 
a multitude of provisions to accelerate the 
world’s transition from fossil fuels to solar, 
wind, nuclear, hydropower and other clean 
energy sources. 

Nearly every country is asked to play 


NATURE | 315 


| NEWS IN FOCUS 


> its part in ensuring that greenhouse-gas 
emissions peak, and then begin to decline, as 
soon as possible. Countries will assess their 
progress towards reducing emissions in 2018, 
and must revisit their climate pledges every five 
years, beginning in 2020. The aim is that these 
pledges will become more ambitious over time. 

To ensure that countries are keeping to 
their commitments, the agreement creates a 
transparent system for measuring, reporting 
and verifying emissions, while allowing some 
flexibility for countries that have little capacity 
to do so. The plan allows for an independent 
technical review, and all but the smallest, poor- 
est countries will have to report their emissions 
every two years. But negotiators have left many 
of the details to be debated at the next major 
climate talks, in 2016. 

“On transparency, the agreement is a little 
bit loosey-goosey,’ says Michael Oppenheimer, 
a climate scientist at Princeton University in 
New Jersey. “It could be turned into something 
that is very effective, but the delegates kicked 
the can down the road.” 

Others worry about how developing 
countries can be helped to build their capa- 
city to monitor emissions. “Transparency and 
governance are not something you obtain with 
a decree,” says Joseph Armathé Amougou, 
director of Cameroon's National Observatory 
on Climate Change. He will be responsible 
for developing and reporting his country’s 
greenhouse-gas inventory, but he currently has 
neither the budget nor the employees to do so. 

The Paris agreement includes non-binding 
language that outlines a plan for wealthy 
nations to increase their climate aid to poorer 
nations beyond their current commitment of 
US$100 billion per year by 2020. And develop- 
ing nations pushed successfully for the pact to 
recognize that vulnerable countries will face 
damage from rising seas, raging storms and 
other impacts of climate change. 

The official recognition of damage was a 
huge achievement, says Mohamed Adow of 
Christian Aid, an advocacy group based in 
London. “We now have loss and damage as 
an integral part of the climate regime.’ But the 
pact explicitly bars poorer nations from seek- 
ing compensation or from holding wealthy, 
major polluters liable for these losses. 


AN ARDUOUS JOURNEY 

Weary negotiators, running on nervous energy 
and caffeine, approved the Paris agreement a 
day after their self-imposed deadline — and 
only after a major push by leaders of the United 
Nations and the host country. 

In a soaring speech, Hollande implored 
delegates to pass an accord that would send 
“a message of life” to rebuke the perpetrators 
of the terrorist attacks that killed 130 people 
in Paris on 13 November. “I will be delighted, 
relieved, proud, that it be launched from Paris, 
because Paris was attacked almost exactly 
a month ago,” he said. “France asks you, 


TIGHT BUDGET 


Major greenhouse-gas emitters have pledged to reduce their carbon 
footprints, but holding warming to 2°C will be a challenge. 


m United States m European Union mlndia m China 
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calls upon you, to adopt the first universal 
agreement on climate.’ 

The long road to the Paris agreement 
began in Rio de Janeiro in 1992, when nations 
approved a general ‘framework to combat 
climate change that left the details for later 
agreements. After 20 annual meetings with 
little progress to curb ever-soaring emissions, 
representatives arrived in Paris with pledges 

from 187 countries 


“History is that outlined the 
written by those __ steps each would take 
who commit, to cut its emissions 
not those who by 2030. 

calculate. Never before had 
Today you have so many promises 


been on the table — 
but many pledges 
were hedged with conditions, such as calls for 
financial aid to build alternative energy plants, 
save remaining forests or relocate people 
living in harm’s way. Even if all of the prom- 
ises were fulfilled, and were followed by 
substantial additional emissions reductions, 
the world would warm 2.7 °C by 2100 (see 
“Tight budget’). This is deep into the ter- 
ritory that scientists expect would prompt 
catastrophic, irreversible climate changes. 
Yet the Paris agreement seeks to limit 
planetary warming to well below 2°C, urging 
nations to pursue an even stricter target, 1.5°C. 
To put this in perspective, the average global 
temperature has already risen 1 °C since the 
start of the Industrial Revolution. 
Many environmentalists say that the 
agreement and the goals are strong enough 
to create momentum 
i and put pressure on 
| governments moving 
forward. “We see the 
key elements that we 
have always said we need 


committed.” 


iN 
ry 


> NATURE.COM for a good agreement,’ 
For Nature’s full says Nathaniel Keohane, 
coverage of the who heads the global 


Paris talks, see: 
go.nature.com/c7146j 


climate programme 
for the Environmental 


316 | NATURE | VOL 528 | 17 DECEMBER 2015 


© 2015 Macmillan Publishers Limited. All rights reserved 


Rest of world 


This sample emissions 
m pathway has a 66% chance 
of limiting warming to 2°C. 


2030 2040 2050 


Defense Fund in New York City. 

Others say that the Paris accord should 
prod businesses to pursue clean energy and 
green growth. “Markets now have the clear 
signal that they need to unleash the full force 
of human ingenuity and scale up investments 
that will generate low-emissions, resilient 
growth,’ said United Nations secretary-general 
Ban Ki-moon. “What was once unthinkable 
has now become unstoppable.” 

Climate scientists who gathered in Paris to 
observe the negotiations were pleased with the 
accord’s ultimate goal, but wanted more details 
about how nations would achieve significant 
emissions reductions. “This does not send a 
clear signal about the level and timing of emis- 
sion cuts, and does not provide a useful yard- 
stick against which to measure progress,’ says 
Steffen Kallbekken, research director at the 
Center for International Climate and Energy 
Policy in Oslo. Although the Paris plan is not 
inconsistent with the science, he says, it does 
not reflect the best available research. 

The Intergovernmental Panel on Climate 
Change (IPCC) has concluded that holding 
warming to 2°C will probably require emis- 
sions to be cut by 40-70% by 2050 compared 
with 2010 levels, Kallbekken notes. Achieving 
the 1.5°C target would require substantially 
larger emissions cuts — of the order of 70-95% 
by 2050. 

The Paris agreement directs the IPCC to 
study scenarios for limiting warming to 1.5 °C, 
and to deliver a report to nations by 2018 to 
help them determine how much to strengthen 
their climate commitments. 

The fact that the accord prominently men- 
tions the 1.5°C target is a huge victory for 
vulnerable countries, says Saleemul Huq, 
director of the International Centre for Cli- 
mate Change and Development in Dhaka, 
Bangladesh. “Coming into Paris, we had all of 
the rich countries and all of the big developing 
countries not on our side,’ says Hug, an adviser 
to a coalition of least-developed nations. “In 
the 14 days that we were here, we managed to 
get all of them on our side.” m SEE EDITORIAL P.307 


SOURCE: GLOBAL CARBON PROJECT 
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Cancer studies clash 


Researchers debate relative importance of environmental 
and intrinsic factors in malignancy development. 


BY HEIDI LEDFORD 


ost cases of cancer result from 
Mew factors such as toxic chemi- 

cals and radiation, contends a study 
published online in Nature on 16 December 
(S. Wu et al. Nature http://dx.doi.org/10.1038/ 
nature16166; 2015). The paper attempts to rebut 
an argument that arose early this year, when a 
report in Science concluded that differences in 
inherent cellular processes are the chief reason 
that some tissues become cancerous more fre- 
quently than others (C. Tomasetti and B. Vogel- 
stein Science 347, 78-81; 2015). 

The work led to assertions that certain forms 
of cancer are mainly the result of “bad luck’, 
and suggested that these types would be rela- 
tively resistant to prevention efforts. “There's 
no question what's at stake here,” says John 
Potter of the Fred Hutchinson Cancer Research 
Center in Seattle, Washington, who studies 
causes of cancer. “This informs whether or not 
we expend energy on prevention.” 

In their Science paper, mathematician 
Cristian Tomasetti and cancer researcher Bert 
Vogelstein at Johns Hopkins University in Bal- 
timore, Maryland, calculated the relationship 
between the number of stem-cell divisions and 
the risk of developing cancer in various tissues. 
Every instance of cell division comes witha risk 
that DNA will be incorrectly copied, leading to 
mutations — some of which could contribute 
to cancer. The duos analysis found a correla- 
tion: the more stem-cell divisions that occur in 
a given tissue over a lifetime, the more likely it 
is to become cancerous. 

Tomasetti and Vogelstein then sorted types 
of cancer according to how much of the vari- 
ability in risk is due to stem-cell divisions versus 
to some extrinsic factor, such as environmental 
exposure to carcinogens. The authors argued 
that although some cancers clearly had strong 
environmental links — such as liver cancers 
caused by hepatitis C infection or lung cancer 
resulting from smoking — there were others for 
which the variation was explained mainly by 
defects in stem-cell division. In those cases, they 
argued, early detection and treatment would be 
more effective than prevention. 

Something about that did not sit right with 
Yusuf Hannun, a cancer researcher at Stony 
Brook University in New York. “What they 
did was interesting, but I was startled by the 
conclusion,” he says. 

The original work, Hannun and his col- 
leagues argue, assumed that the two variables 


— intrinsic stem-cell division rates and extrinsic 
factors — were entirely independent. But what if 
environmental exposures affect stem-cell divi- 
sion rates, as radiation is known to do? 


A DIFFERENT TAKE 

Hannun and his team also used other lines 
of evidence to try to pinpoint the contribu- 
tion of environmental factors to cancer risk. 
They looked at epidemiological data show- 
ing that, for example, people who migrate 
from regions of lower cancer risk to those 
with higher risk soon develop disease at rates 
consistent with their new homes. The authors 
also examined patterns in the mutations asso- 
ciated with certain cancers; ultraviolet light, 
for example, tends to create a tell-tale signa- 
ture of mutations in DNA. And they used 
other mathematical models, expanding the 
data set used in the earlier work to include 
prostate and breast cancer — two of the most 
common cancers. 

The models suggested that mutations 
during cell division rarely build up to the point 
of producing cancer, even in tissues with rela- 
tively high rates of cell division. In almost all 
cases, the team found that some exposure to 

carcinogens or other 


“There’s no environmental fac- 
question what’s tors would be needed 
at stake. This to trigger disease. 

informs whether Tomasetti coun- 
or not we expend ters that he never 
energy on intended to explain 


why cancers develop. 
His analysis, he 
says, was based on normal stem-cell divi- 
sion in healthy tissue and was meant to 
explain only why some cancers are more 
prevalent than others. He also argues that 
the models created by Hannun and his 
colleagues make too many assumptions 
and fail to incorporate some features of 
tumour growth. 

Some specialists in cancer prevention wel- 
come the Nature paper because of fears that the 
public — and possibly also funders of scientific 
research — might conclude that prevention 
efforts are not worthwhile, says Edward Gio- 
vannucci, who studies cancer prevention at the 
Harvard T. H. Chan School of Public Health in 
Boston, Massachusetts. “By not smoking, your 
lifetime risk of lung adenocarcinoma drops 
dramatically,” he says. “The fact that your risk 
of pelvic sarcoma is even lower because there’s 
less stem-cell division — so what?” = 


prevention.” 
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Chinese fishing boats are pursued by a South Korean coastguard vessel (top right) in the Yellow Sea in 2011; the nations have overlapping claims in the region. 


OCEANOGRAPHY 


Talks lift hopes in 
territorial impasse 


Negotiations between South Korea and China to demarcate 
Yellow Sea boundary could aid marine science. 


BY MARK ZASTROW, SEOUL 


( “rl and South Korea have scheduled 
talks for 22 December to address a 
decades-long boundary dispute that 

has hampered research and exploration in 

the Yellow Sea. This northern part of the East 

China Sea, between mainland China and the 

Korean peninsula, is home to a rich ecosys- 

tem that is under intense environmental strain 

from human activities. 

Confrontations over fishing rights in the dis- 
puted region have turned deadly — and research 
is not immune to the tension. South Korean 
scientists report that the Chinese coastguard 
has intercepted research vessels in the Yellow 
Sea and East China Sea on at least ten occa- 
sions, threatening their activities and forcing 
them to move east. At other times, the Chinese 
navy has shadowed South Korean research ves- 
sels. “The confrontations are happening all the 
time,’ says marine sedimentologist Kyung-Sik 
Choi of Seoul National University. 

The friction in the Yellow Sea is one of many 
marine territorial disputes in east Asia: over 


the past two years, China has captured the 
world’s attention with its construction of artifi- 
cial islands in the South China Sea and a series 
of alleged rammings of local fishing boats by 
its coastguard and navy vessels. A spat with 
Japan over islands and gas fields in the East 
China Sea is also escalating, as China boosts its 
military presence and extraction efforts there. 

In this particular case, both parties seem 
ready — at least publicly — to seek a solu- 
tion. Chinese President Xi Jinping and South 
Korean President Park Geun-hye pledged in 
July 2014 to begin talks by the end of 2015. 

“Tf the maritime boundary is fixed in some 
way, it will be good for scientists because we 
will know exactly where our playground is,” 
says Hyun-Chul Han, a marine geologist at 
the Korea Institute of Geoscience and Mineral 
Resources in Daejeon. “It will be a great relief 
and secure scientists’ safety.” 

Few expect South Korea and China to fully 
resolve their dispute in this first round of talks. 
But some analysts say that boosting scientific ties 
between the nations in the Yellow Sea would bea 
feasible — and politically valuable — initial step. 
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“Maybe this could be an area of low-hanging 
fruit that these talks could address, to at least 
point to some level of utility and productive- 
ness,’ says James Schoff, a senior associate at the 
Carnegie Endowment for International Peace in 
Washington DC. 


THE LAW OF THE SEA 

Under the 1982 United Nations Convention on 
the Law of the Sea, nations can claim exclusive 
rights to exploit resources in an exclusive eco- 
nomic zone (EEZ) within 200 nautical miles 
(370 kilometres) of their coasts. But because 
the Yellow Sea is less than 400 nautical miles in 
breadth, China and South Korea’s EEZs over- 
lap, and they have never agreed to a boundary 
(see “Troubled waters’). Research vessels from 
both countries avoid straying across a line of 
longitude about halfway between Seoul and 
Qingdao, effectively dividing the Chinese and 
South Korean marine-science communities. 
The law does not in principle restrict purely 
scientific activities in another nation’s EEZ, but 
in practice, countries can quickly set these zones 
off-limits to others. 

Chinese data covering the Yellow Sea look 
“cut in half” because of the dispute, says 
Zuosheng Yang, a marine geologist at the Ocean 
University of China in Qingdao. 

In the past, China has rejected simply draw- 
ing a line that is equidistant from the two 
nations’ coasts. Instead, it claimed rights to 
about two-thirds of the Yellow Sea, based on the 
extent to which sediments billowing out from 
China's Huang He and Yangtze rivers blanket 
the sea floor. This ‘silt line was met with howls 
of protest from South Korean scholars and 
received little international support. But the silt 
line has a practical significance: Chinese boats 
motor across it to escape the turgid, fish-poor 
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sediment plumes, sometimes leading to fatal 
clashes with South Korea’s coastguard. In 2011, a 
Chinese fisherman stabbed a Korean coast- 
guard to death with a shard of broken window 
glass; in a separate 2014 skirmish, the Korean 
coastguard shot and killed a Chinese fisherman. 
The dispute has also prevented cooperation 
in assessing the deterioration of the Yellow 
Sea’s marine ecosystem. Dams in Chinese riv- 
ers have interrupted the once-steady flow of 
sediment and nutrients into the waters, and 
pollution has created enormous algal blooms. 
Urbanization has also claimed most of the tidal 
flats that once ringed the Yellow Sea basin, 
threatening key habitats for migratory birds. 
Monitoring and management of the basin 
requires collaboration, says Paul Liu, an ocean- 
ographer at North Carolina State University 
in Raleigh. South Korean and Chinese ocean 
researchers do share some data through a joint 
marine-research centre in Qingdao, which has 
held workshops and coordinated some work 
since 1995. But when asked about the boundary 
dispute, Wei Zheng, the centre's vice-director, 
said: “Tt still is a problem? She declined to com- 
ment further, citing the sensitivity of the issue. 
Choi, for example, says that he and his 
colleagues would like to conduct a deep seismic 


TROUBLED WATERS 


Overlapping territorial claims by China and 
South Korea have interfered with marine 
science research in the Yellow Sea. 


CHINA 


Yellow 
Sea 


survey transecting the entire Yellow Sea. But he 
says that the project would need permission and 
protection from China’s coastguard to prevent 
passing fishing boats causing any damage to the 
kilometres-long cables and attached equipment. 

Both Liu and Yang say that an agreement 
would similarly foster collaborations to look 
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China exclusive economic zone (EEZ) 


at how sediments have swirled across the 
Yellow Sea in the past, and how new dams on 
China's rivers have changed that process. “The 
Chinese cannot only study the western side, 
or Koreans cannot only study the eastern 
side,’ Liu says. “They have to work together 
to know the whole picture of the area.” m 


Europe’s genetically edited 
plants stuck in legal limbo 


Scientists frustrated at delay in deciding if GM regulations apply to precision gene editing. 


BY ALISON ABBOTT 


lant geneticist Stefan Jansson is 
Pp champing at the bit to start field trials on 

crops tweaked with powerful gene-edit- 
ing technologies. He plans to begin by using 
edits to study how the cress plant Arabidopsis 
protects its photosynthetic machinery from 
damage in excessively bright light. 

But the future of his work depends on the 
European Commission’s answer to a legal 
conundrum. Should it regulate a gene-edited 
plant that has no foreign DNA as a genetically 
modified (GM) organism? 

Jansson, who works at Umea University in 
Sweden, says that he will drop his experiments 
if the plants are classed as GM, because Europe's 
onerous regulations would make his work too 
expensive and slow. He and many others are 
anxiously awaiting the commission's decision, 
which will dictate how they approach experi- 
ments using the latest gene-editing techniques, 


including the popular CRISPR-Cas9 method. 
The commission has repeatedly stalled on 
delivering its verdict, which will apply to edited 
animals and microorganisms as well as plants. It 
now says that it will make its legal analysis pub- 
lic by the end of March. Swedish authorities, 
meanwhile, have told Jansson that unless the 
commission specifies otherwise, they will not 
require his cress to be subject to GM regulations. 


GENETIC EDITING 

The legal limbo is having a big impact on 
research, says René Smulders of the plant- 
breeding division at Wageningen University 
and Research Centre in the Netherlands. 
He says that this year, he was rejected for a 
European Union grant — on changing the 
composition of a plant's oils by editing a gene 
— because referees were concerned about the 
legal uncertainty. “Some scientists hesitate 
to start using the new methods in case they 
end up being regulated and their research 


projects hit a dead end,” he says. 

At issue is the interpretation of a 2001 Euro- 
pean Commission directive on releasing GM 
organisms into the environment, which covers 
field trials and cultivation. It defines GM organ- 
isms as having alterations that cannot occur nat- 
urally, which were made by genetic engineering. 

What is unclear is how this relates to experi- 
ments, such as Jansson’s, in which researchers 
introduce foreign DNA to direct a precise edit 
ina plant’s own genetic material but then use 
selective breeding to remove the foreign gene. 
The final plant has a few tweaked nucleotides, 
but cannot be distinguished from a wild plant 
that might have acquired the same mutation 
naturally — so it cannot be traced in the 
environment as EU regulations require. 

Many EU member states — including Swe- 
den — have conducted their own analyses of the 
directive, and argue that it should not apply to 
edited plants that do not contain foreign DNA. 
But some non-governmental organizations > 
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> (NGOs) hostile to genetic manipula- 
tion have produced analyses that conclude 
the directive should apply because genetic 
engineering is involved. 

Academic scientists and seed and crop 
companies fear that plants made with the 
latest gene-editing techniques may share the 
fate of conventional GM plants in Europe. 
Strict regulations, cumbersome bureau- 
cracy and activism against GM organisms 
have meant that scientists in some coun- 
tries, such as Germany, do not even attempt 
field trials. The regulations have increased 
the costs of bringing a GM crop to market, 
and many European nations do not allow 
such crops to be cultivated at all. That is 
frustrating for plant scientists who want 
their work to be useful to the world, says 
Jonathan Jones, a plant researcher at the 
Sainsbury Laboratory in Norwich, UK. 

“We hoped that the new plant-breeding 
techniques would offer ways of achieving 
the same outcome without the onerous 
regulations — and fear that might not turn 
out to be the case,” he says. 

Many countries outside Europe do not 
face the same uncertainty, because they reg- 
ulate GM organisms according to the nature 
of the product, not how it was made. In the 
United States, gene-edited crops containing 
no foreign genetic material are assessed ona 
case-by-case basis. In 2004, the biotechnol- 
ogy company Cibus, based in San Diego, 
California, was told that the US Department 
of Agriculture would not need to regulate its 
herbicide-resistant oilseed rape, made with 
an earlier form of gene-editing. Its crop is 
now cultivated in the United States. (The 
White House did, however, begin a review 
ofall US biotechnology regulation in July.) 

Since 2011, Cibus has asked six countries 
— Finland, Germany, Ireland, Spain, Sweden 
and the United Kingdom — whether they 
would consider its crop to come under the 
scope of the EU directive. Without guide- 
lines from the commission, each conducted 
its own analysis and said that it would not. 
Cibus has now done field trials in the United 
Kingdom and Sweden, but it put its activities 
on hold after the commission sent a letter 
to all EU member states on 15 June, asking 
them to wait for its legal interpretation. 

Whatever the commission decides, it is 
likely that either a member state, an NGO 
or acompany will sue — meaning that the 
European Court of Justice may make the 
final, binding decision on the matter. 

Many plant scientists do basic research, so 
their gene-edited plants never need to leave 
the greenhouse. But Jansson must plant his 
cress outside to test its photosynthetic abili- 
ties in natural conditions. With his country’s 
approval, he plans to plant the crop in the 
spring. “Lawyers talk and talk — I think it is 
important for Europe to have a test case,’ he 
Says. m SEE EDITORIAL P.307 
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Gerinan researchers 
pledge refugee help 


Social scientists launch integration studies and warn of 
need to counter rising xenophobia. 


BY QUIRIN SCHIERMEIER 


fter civil war broke out in Syria, 
A erannad Khamis lost his parents 
and his home — but not his dream of 
becoming a scientist. In July 2013, he boarded 
a flight from Damascus, where he had studied 
electrical engineering, to Egypt. In Alexandria, 
he paid traffickers about €5,000 (US$5,500) for 
a boat passage to Europe. The 9-day voyage to 
the Italian island of Lampedusa, on an unsea- 
worthy sloop with 100 other desperate refu- 
gees, was a nightmare of fear, vomit and thirst. 
Two years later, Khamis, now 22, is attend- 
ing classes in maths, physics and chemis- 
try at the Technical University of Munich 
(TUM) in Germany, where he sought asylum 
in August 2013 and was last year accepted 
as a war refugee. “There is no future for me 
in Syria,’ he says on a cold December day in 
Munich. “I would like to stay here to study and 
find a good research job. My dream is to dis- 
cover something new.” 
Social scientists 


studying the flow of “Science has a 
refugees into Ger- responsibility to 
many want to dis- help tackle the 
cover something hugeintegration 
themselves: how challenge 

many of theincom- ahead.” 


ing people are, like 

Khamis, well-qualified, motivated and eager 
to learn — a boon for the economy. These 
migration researchers say that Germany has 
become a case study in the difficulties of sud- 
denly integrating a large group of culturally 
diverse foreigners into a society; the nation has 
registered nearly one million asylum-seekers 
this year, more than half of them from Syria. It 
is the highest such influx in Western Europe. 

After a short-lived wave of hospitality in 
September, when chancellor Angela Merkel 
promised that Germany would be a welcom- 
ing host to the persecuted, many citizens and 
some right-leaning politicians have begun to 
voice concerns, painting a picture of a Muslim- 
dominated parallel society of poorly trained 
recipients of social welfare. 

Research may be able to counter the rising 
tide of xenophobia and aid the urgent pro- 
cess of resettling refugees by revealing more 
about migrants’ skills and cultural values, says 
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David Schiefer, a Berlin-based psychologist 
with a German advisory body on migration 
and integration who is planning interviews 
with refugees. “We need to give these people 
a voice,” he says. 

With about half of the newcomers under 
25 years of age, Germany’s higher-education 
and science systems have a particular obliga- 
tion — and the well-funded capacity — to help, 
say researchers. “Science has a responsibility 
to help tackle the huge integration challenge 
ahead,” says Alexander Kurz, head of human 
resources at the Fraunhofer Society in Munich, 
which runs centres for applied research. 
“There is great readiness among our staff of 
25,000 scientists from 100 nations to provide 
mentorship and practical help” 


LISTENING TO REFUGEES 

Reliable data on refugees’ qualifications and 
backgrounds are lacking. “We’re poking 
around in the fog,” says Ludger W6fmann, a 
director of the Ifo Center for the Economics 
of Education in Munich. International assess- 
ments of 15-year-olds suggest that up to two- 
thirds of Syrian refugee students might lack 
basic reading, writing and maths skills, he says. 
German industrial groups say that the large 
majority of migrants have minimal skills and 
poor language abilities, making them hardly 
employable. 

But these assumptions are ill-informed, 
says Steven Vertovec, director of the Max 
Planck Institute for the Study of Religious 
and Ethnic Diversity in Géttingen. In fact, the 
newcomers are probably as diverse as Ger- 
man society at large, he says. “There are many 
highly educated, secularized people among 
the Syrians, Iraqis and Afghans who are seek- 
ing asylum here.” 

Vertovec is leading a study in Lower Saxony 
in northern Germany that aims to interview 
asylum-seekers to examine their needs and 
aspirations, as well as to uncover best prac- 
tices for responding to refugees. The goal is to 
produce practical guidelines for city workers 
and volunteer social workers in asylum-seeker 
camps on how to work with groups of migrants 
who may differ enormously in age, religion, 
language and education status. “Successful 
integration requires a nuanced understanding 
of migrants’ backgrounds and values,” he says. 


Students such as Khamis (who officially has 
‘guest’ status at TUM; he is not yet formally 
enrolled in Germany’s university system) are 
not an uncommon sight in the country’s uni- 
versity lecture halls. TUM has about 100 guest 
students; across the country, there are a few 
thousand. To help universities to cope with the 
influx, the government in November approved 
an extra €100 million for student counselling, 
language training and stipends. 


GOVERNMENT SUPPORT 

On 11 December, Germany’s main research- 
funding agency, the DFG, encouraged grant 
holders to consider hiring refugee scientists in 
their research. DFG-funded scientists whose 
work would benefit from the participation of 
qualified academics or PhD students among 
the refugees are free to submit supplemental 
proposals for ‘guest funding’ said DFG presi- 
dent Peter Strohschneider. 

Ina strategy paper seen by Nature, a group 
from seven Max Planck institutes, in response 
to a call for research ideas by the society’s presi- 
dent, Martin Stratmann, has outlined a variety 
of research needs around humanitarian migra- 
tion, from international law and human-rights 
issues to health and gender studies. 

Marie-Claire Foblets, director of the Max 
Planck Institute for Social Anthropology in 
Halle, plans to ask a culturally diverse group 
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Mohammad Khamis (centre), who left Syria in 2013, is now attending the Technical University of Munich. 


of refugees — including guest students at the 
University of Halle- Wittenberg — for accounts 
of their lives and experiences. Other questions, 
such as those concerning refugees’ citizenship 
and civil rights, the potential lure of extrem- 
ism, and the fate of children who might be 
staying with radicalized parents, will require 
the involvement of law experts, criminologists, 


educators and others, she says. 

Khamis, for one, is happy to write up the 
story of his life for research. Having passed 
German-language tests, he hopes to enrol at 
the university next term as a regular student. 
“Germany has been good to me,’ he says. “Now 
that my life can start again I do hope that I can 
give something back.” m SEE EDITORIALP.308 
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Myths 


that will not die 


False beliefs and wishful thinking about the human experience 
are common. They are hurting people — and holding back science. 


BY MEGAN SCUDELLARI 


n 1997, physicians in southwest Korea 
began to offer ultrasound screening for 
early detection of thyroid cancer. News 
of the programme spread, and soon phy- 
sicians around the region began to offer the 
service. Eventually it went nationwide, piggy- 
backing on a government initiative to screen 
for other cancers. Hundreds of thousands took 
the test for just US$30-50. 
Across the country, detection of thyroid 
cancer soared, from 5 cases per 100,000 people 
in 1999 to 70 per 100,000 in 2011. Two-thirds 
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of those diagnosed had their thyroid glands 
removed and were placed on lifelong drug 
regimens, both of which carry risks. 

Such a costly and extensive public-health 
programme might be expected to save lives. 
But this one did not. Thyroid cancer is now 
the most common type of cancer diagnosed in 
South Korea, but the number of people who die 
from it has remained exactly the same — about 
1 per 100,000. Even when some physicians in 
Korea realized this, and suggested that thy- 
roid screening be stopped in 2014, the Korean 
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Thyroid Association, a professional society 
of endocrinologists and thyroid surgeons, 
argued that screening and treatment were basic 
human rights. 

In Korea, as elsewhere, the idea that the early 
detection of any cancer saves lives had become 
an unshakeable belief. 

This blind faith in cancer screening is an 
example of how ideas about human biol- 
ogy and behaviour can persist among peo- 
ple — including scientists — even though the 
scientific evidence shows the concepts to be 
false. “Scientists think they're too objective to 
believe in something as folklore-ish as a myth,’ 
says Nicholas Spitzer, director of the Kavli 
Institute for Brain and Mind at the University 
of California, San Diego. Yet they do. 

These myths often blossom from a seed 
of a fact — early detection does save lives for 
some cancers — and thrive on human desires 
or anxieties, such as a fear of death. But they 
can do harm by, for instance, driving people 
to pursue unnecessary treatment or spend 
money on unproven products. They can also 
derail or forestall promising research by dis- 
tracting scientists or monopolizing funding. 
And dispelling them is tricky. 

Scientists should work to discredit myths, 
but they also have a responsibility to try to 
prevent new ones from arising, says Paul 
Howard-Jones, who studies neuroscience and 
education at the University of Bristol, UK. “We 
need to look deeper to understand how they 
come about in the first place and why they're 
so prevalent and persistent.” 

Some dangerous myths get plenty of air 
time: vaccines cause autism, HIV doesnt 
cause AIDS. But many others swirl about, too, 
harming people, sucking up money, muddying 
the scientific enterprise — or simply getting 
on scientists nerves. Here, Nature looks at the 
origins and repercussions of five myths that 
refuse to die. 


MYTH 1: SCREENING SAVES LIVES FOR ALL 
TYPES OF CANCER 

Regular screening might be beneficial for some 
groups at risk of certain cancers, such as lung, 
cervical and colon, but this isn’t the case for all 
tests. Still, some patients and clinicians defend 
the ineffective ones fiercely. 

The belief that early detection saves lives 
originated in the early twentieth century, 
when doctors realized that they got the best 
outcomes when tumours were identified and 
treated just after the onset of symptoms. The 
next logical leap was to assume that the earlier 
a tumour was found, the better the chance of 
survival. “We've all been taught, since we were 
at our mother’s knee, the way to deal with can- 
cer is to find it early and cut it out,’ says Otis 
Brawley, chief medical officer for the American 
Cancer Society. 

But evidence from large randomized trials 
for cancers such as thyroid, prostate and breast 
has shown that early screening is not the 


lifesaver it is often advertised as. For example, 
a Cochrane review of five randomized 
controlled clinical trials totalling 341,342 par- 
ticipants found that screening did not signifi- 
cantly decrease deaths due to prostate cancer’. 

“People seem to imagine the mere fact that 
you found a cancer so-called early must be a 
benefit. But that isn't so at all? says Anthony 
Miller at the University of Toronto in Can- 
ada. Miller headed the Canadian National 
Breast Screening Study, a 25-year study of 
89,835 women aged 40-59 years old’ that 
found that annual mammograms did not 
reduce mortality from breast cancer. That's 
because some tumours will lead to death 
irrespective of when they are detected and 
treated. Meanwhile, aggressive early screen- 
ing has a slew of negative health effects. Many 
cancers grow slowly and will do no harm if 
left alone, so people end up having unnec- 
essary thyroidectomies, mastectomies and 
prostatectomies. So on a population level, 
the benefits (lives saved) do not outweigh the 
risks (lives lost or interrupted by unnecessary 
treatment). 


“We cherry-pick 
the numbers that 
put us on top.” 


Still, individuals who have had a cancer 
detected and then removed are likely to feel 
that their life was saved, and these personal 
experiences help to keep the misconception 
alive. And oncologists routinely debate what 
ages and other risk factors would benefit 
from regular screening. 

Focusing so much attention on the current 
screening tests comes at a cost for cancer 
research, says Brawley. “In breast cancer, 
we've spent so much time arguing about age 
40 versus age 50 and not about the fact that 
we need a better test,” such as one that could 
detect fast-growing rather than slow-growing 
tumours. And existing diagnostics should be 
rigorously tested to prove that they actually 
save lives, says epidemiologist John Ioannidis 
of the Stanford Prevention Research Center 
in California, who this year reported that 
very few screening tests for 19 major diseases 
actually reduced mortality’. 

Changing behaviours will be tough. Gilbert 
Welch at the Dartmouth Institute for Health 
Policy and Clinical Practice in Lebanon, New 
Hampshire, says that individuals would rather 
be told to get a quick test every few years 
than be told to eat well and exercise to pre- 
vent cancer. “Screening has become an easy 
way for both doctor and patient to think they 
are doing something good for their health, but 
their risk of cancer hasn't changed at all.” 
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MYTH 2: ANTIOXIDANTS ARE GOOD AND FREE 
RADICALS ARE BAD 


In December 1945, chemist Denham 
Harman's wife suggested that he read an article 
in Ladies’ Home Journal entitled ‘Tomorrow 
You May Be Younger’. It sparked his interest 
in ageing, and years later, as a research associ- 
ate at the University of California, Berkeley, 
Harman had a thought “out of the blue’, as he 
later recalled. Ageing, he proposed, is caused 
by free radicals, reactive molecules that build 
up in the body as by-products of metabolism 
and lead to cellular damage. 

Scientists rallied around the free-radical 
theory of ageing, including the corollary 
that antioxidants, molecules that neutralize 
free radicals, are good for human health. By 
the 1990s, many people were taking anti- 
oxidant supplements, such as vitamin C and 
B-carotene. It is “one of the few scientific 
theories to have reached the public: gravity, 
relativity and that free radicals cause ageing, 
so one needs to have antioxidants’, says Sieg- 
fried Hekimi, a biologist at McGill University 
in Montreal, Canada. 

Yet in the early 2000s, scientists trying to 
build on the theory encountered bewilder- 
ing results: mice genetically engineered to 
overproduce free radicals lived just as long as 
normal mice’, and those engineered to over- 
produce antioxidants didn't live any longer 
than normal’. It was the first of an onslaught 
of negative data, which initially proved dif- 
ficult to publish. The free-radical theory “was 
like some sort of creature we were trying to 
kill. We kept firing bullets into it, and it just 
wouldn't die,’ says David Gems at University 
College London, who started to publish his 
own negative results in 2003 (ref. 6). Then, 
one study in humans’ showed that antioxidant 
supplements prevent the health-promoting 
effects of exercise, and another associated 
them with higher mortality’. 

None of those results has slowed the global 
antioxidant market, which ranges from food 
and beverages to livestock feed additives. It is 
projected to grow from US$2.1 billion in 2013 
to $3.1 billion in 2020. “It’s a massive racket,” 
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says Gems. “The reason the notion of oxidation 
and ageing hangs around is because it is per- 
petuated by people making money out of it” 

Today, most researchers working on ageing 
agree that free radicals can cause cellular dam- 
age, but that this seems to be a normal part of 
the body’s reaction to stress. Still, the field has 
wasted time and resources asa result. And the 
idea still holds back publications on possible 
benefits of free radicals, says Michael Ristow, 
a metabolism researcher at the Swiss Federal 
Institute of Technology in Zurich, Switzer- 
land. “There is a significant body of evidence 
sitting in drawers and hard drives that sup- 
ports this concept, but people aren't putting it 
out,’ he says. “It’s still a major problem” 

Some researchers also question the broader 
assumption that molecular damage of any 
kind causes ageing. “There’s a question mark 
about whether really the whole thing should 
be chucked out,” says Gems. The trouble, he 
says, is that “people don't know where to go 
now”. 


MYTH 3: HUMANS HAVE EXCEPTIONALLY 
LARGE BRAINS 

The human brain — with its remarkable 
cognition — is often considered to be the 
pinnacle of brain evolution. That dominance 
is often attributed to the brain's exceptionally 
large size in comparison to the body, as well 
as its density of neurons and supporting cells, 
called glia. 

None of that, however, is true. “We cherry- 
pick the numbers that put us on top,’ says Lori 
Marino, a neuroscientist at Emory University 
in Atlanta, Georgia. Human brains are about 
seven times larger than one might expect rela- 
tive to similarly sized animals. But mice and 
dolphins have about the same proportions, 


MYTHS THAT PERSIST 
Irksome misbeliefs 


Nature polled doctors and scientists for 
the medical myths that they find most 
frustrating. Here’s what turned up. 


Vaccines cause autism 

Although there are some risks 
associated with vaccines, the connection 
to neurological disorders has been 
debunked many times over. 


Paracetamol (acetaminophen) works 
through known mechanisms 

Although it is widely used, there are only 
hints as to how it and other common 
drugs actually work. 


The brain is walled off from the immune 
system 

The brain has its own immune cells, and 
a lymphatic system that connects the 
brain to the body’s immune system has 
recently been discovered. 


Homeopathy works. 
It doesn’t. 


and some birds have a larger ratio. 

“Human brains respect the rules of scaling. 
We have a scaled-up primate brain,” says 
Chet Sherwood, a biological anthropologist 
at George Washington University in Washing- 
ton DC. Even cell counts have been inflated: 
articles, reviews and textbooks often state 
that the human brain has 100 billion neurons. 
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More accurate measures suggest that the 
number is closer to 86 billion. That may sound 
like a rounding error, but 14 billion neurons is 
roughly the equivalent of two macaque brains. 

Human brains are different from those of 
other primates in other ways: Homo sapiens 
evolved an expanded cerebral cortex — the 
part of the brain involved in functions such as 
thought and language — and unique changes 
in neural structure and function in other areas 
of the brain. 

The myth that our brains are unique 
because of an exceptional number of neurons 
has done a disservice to neuroscience because 
other possible differences are rarely investi- 
gated, says Sherwood, pointing to the exam- 
ples of energy metabolism, rates of brain-cell 
development and long-range connectivity of 
neurons. “These are all places where you can 
find human differences, and they seem to be 
relatively unconnected to total numbers of 
neurons,’ he says. 

The field is starting to explore these topics. 
Projects such as the US National Institutes of 
Health’s Human Connectome Project and the 
Swiss Federal Institute of Technology in Laus- 
anne’s Blue Brain Project are now working to 
understand brain function through wiring 
patterns rather than size. 


MYTH 4: INDIVIDUALS LEARN BEST 

WHEN TAUGHT IN THEIR PREFERRED 
LEARNING STYLE 

People attribute other mythical qualities to 
their unexceptionally large brains. One such 
myth is that individuals learn best when they 
are taught in the way they prefer to learn. A 
verbal learner, for example, supposedly learns 
best through oral instructions, whereas a vis- 
ual learner absorbs information most effec- 
tively through graphics and other diagrams. 

There are two truths at the core of this myth: 
many people have a preference for how they 
receive information, and evidence suggests 
that teachers achieve the best educational 
outcomes when they present information in 
multiple sensory modes. Couple that with peo- 
ple’s desire to learn and be considered unique, 
and conditions are ripe for myth-making. 

“Learning styles has got it all going for it: 
a seed of fact, emotional biases and wishful 
thinking,” says Howard-Jones. Yet just like 
sugar, pornography and television, “what you 
prefer is not always good for you or right for 
you,” says Paul Kirschner, an educational psy- 
chologist at the Open University of the Neth- 
erlands. 

In 2008, four cognitive neuroscientists 
reviewed the scientific evidence for and against 
learning styles. Only a few studies had rigorously 
put the ideas to the test and most of those that 
did showed that teaching in a person's preferred 
style had no beneficial effect on his or her learn- 
ing. “The contrast between the enormous pop- 
ularity of the learning-styles approach within 
education and the lack of credible evidence for its 


utility is, in our opinion, striking and disturbing,” 
the authors of one study wrote’. 

That hasn't stopped a lucrative industry from 
pumping out books and tests for some 71 pro- 
posed learning styles. Scientists, too, perpetuate 
the myth, citing learning styles in more than 
360 papers during the past 5 years. “There are 
groups of researchers who still adhere to the 
idea, especially folks who developed ques- 
tionnaires and surveys for categorizing peo- 
ple. They have a strong vested interest,” says 
Richard Mayer, an educational psychologist 
at the University of California, Santa Barbara. 

In the past few decades, research into 
educational techniques has started to show 
that there are interventions that do improve 
learning, including getting students to sum- 
marize or explain concepts to themselves. And 
it seems almost all individuals, barring those 
with learning disabilities, learn best from a 
mixture of words and graphics, rather than 
either alone. 

Yet the learning-styles myth makes it 
difficult to get these evidence-backed con- 
cepts into classrooms. When Howard-Jones 
speaks to teachers to dispel the learning-styles 
myth, for example, they often don't like to hear 
what he has to say. “They have disillusioned 
faces. Teachers invested hope, time and effort 
in these ideas,” he says. “After that, they lose 
interest in the idea that science can support 
learning and teaching” 


MYTH 5: THE HUMAN POPULATION IS GROWING 
EXPONENTIALLY (AND WE’RE DOOMED) 

Fears about overpopulation began with 
Reverend Thomas Malthus in 1798, who pre- 
dicted that unchecked exponential population 
growth would lead to famine and poverty. 

But the human population has not and is 
not growing exponentially and is unlikely 
to do so, says Joel Cohen, a populations 
researcher at the Rockefeller University in 
New York City. The world’s population is now 
growing at just half the rate it was before 1965. 
Today there are an estimated 7.3 billion peo- 
ple, and that is projected to reach 9.7 billion 
by 2050. Yet beliefs that the rate of popula- 
tion growth will lead to some doomsday 
scenario have been continually perpetuated. 
Celebrated physicist Albert Bartlett, for exam- 
ple, gave more than 1,742 lectures on expo- 
nential human population growth and the dire 
consequences starting in 1969. 

The world’s population also has enough to 
eat. According to the Food and Agriculture 
Organization of the United Nations, the rate of 
global food production outstrips the growth of 
the population. People grow enough calories 
in cereals alone to feed between 10 billion and 
12 billion people. Yet hunger and malnutrition 
persist worldwide. This is because about 55% 
of the food grown is divided between feed- 
ing cattle, making fuel and other materials or 
going to waste, says Cohen. And what remains 
is not evenly distributed — the rich have 


plenty, the poor have little. Likewise, water 
is not scarce on a global scale, even though 
1.2 billion people live in areas where it is. 

“Overpopulation is really not overpopu- 
lation. It’s a question about poverty,’ says 
Nicholas Eberstadt, a demographer at the 
American Enterprise Institute, a conserva- 
tive think tank based in Washington DC. Yet 
instead of examining why poverty exists and 
how to sustainably support a growing popula- 
tion, he says, social scientists and biologists 
talk past each other, debating definitions and 
causes of overpopulation. 

Cohen adds that “even people who know 
the facts use it as an excuse not to pay atten- 
tion to the problems we have right now’, 
pointing to the example of economic systems 
that favour the wealthy. 

Like others interviewed for this article, 
Cohen is less than optimistic about the chances 
of dispelling the idea of overpopulation and 
other ubiquitous myths (see ‘Myths that per- 
sist’), but he agrees that it is worthwhile to 
try to prevent future misconceptions. Many 
myths have emerged after one researcher 
extrapolated beyond the narrow conclusions 
of another’s work, as was the case for free radi- 
cals. That “interpretation creep’, as Spitzer calls 
it, can lead to misconceptions that are hard to 
excise. To prevent that, “we can make sure an 
extrapolation is justified, that we're not going 
beyond the data’, suggests Spitzer. Beyond 


that, it comes down to communication, says 
Howard-Jones. Scientists need to be effective 
at communicating ideas and get away from 
simple, boiled-down messages. 

Because once a myth is here, it is often here 
to stay. Psychological studies suggest that the 
very act of attempting to dispel a myth leads 
to stronger attachment to it. In one experi- 
ment, exposure to pro-vaccination messages 
reduced parents’ intention to vaccinate their 
children in the United States. In another, cor- 
recting misleading claims from politicians 
increased false beliefs among those who 
already held them. “Myths are almost impos- 
sible to eradicate,” says Kirschner. “The more 
you disprove it, often the more hard core it 
becomes.” = 


Megan Scudellari is a science journalist in 
Boston, Massachusetts. 
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e@ 
Why synthesize? 
Philip Ball ponders the many reasons that chemists make molecules, 
and weighs what is lost, and gained, when they don’t. 


hy do chemists make molecules? 
The obvious (and true) answer 
is: because we need them. That is 


why chemical synthesis is still vibrant, and 
will continue to supply the drugs, materials 
and commodities of the twenty-first century. 
Every year brings its bounty. In 2015, chem- 
ists published a new and elegant route to the 
anticancer drug paclitaxel (Taxol), and syn- 
theses of a nodulisporic acid that might act 
as an insecticide’ and, in this journal, of an 
anti-HIV alkaloid’. 

There are also less utilitarian reasons 
for making molecules. One chemist might 
want to explore theoretical questions, such 
as what constitutes a bond. Another might 
delight in, and be curious about, the vari- 
ety of shapes and structures that molecules 
can have. That diversity of purpose is how it 


should be. For at the root of the impulse to 
build molecules is a deep, cherished belief 
that arguably distinguishes chemistry from 
other sciences: that there is an art in making, 
worth nurturing for its own sake. 
Chemical synthesis can entail many 
things — minor modification of existing 
molecular frameworks, for example, or 
making new materials. Total synthesis — the 
complete construction of a complex (often 
natural) molecule from simple reagents — 
has long been seen as the epitome of the art. 
But some say that the age of monumental 
projects to make complicated molecules is 
waning. These long and expensive proce- 
dures may produce tiny yields of the target 
molecule. And now there are automated 
methods that put molecules together; even- 
tually, even the synthetic route might be 


planned automatically. 

So, could bespoke, elaborate synthesis 
become a boutique rarity akin to the hand- 
crafting of books in the age of e-readers 
and print-on-demand? And if synthesis is 
relegated to a routine, should chemists be 
worried? 

Chemists periodically revisit (and revile) 
the argument over whether total synthe- 
sis is moribund, generally with more heat 
than light. It’s the wrong argument. Both 
the methods and motives of chemistry are 
evolving fast. We should be focusing on how 
synthesis responds. That response may be 
driven partly by pragmatism. But synthesis 
also has pedagogical and — unusually in 
acore scientific discipline — aesthetic dimen- 
sions that must be factored into the equation. 
There are several possible reasons to make } 
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> complex molecules by total synthesis. A 
century ago the aim was often to identify a 
molecular structure, as in Robert Robinson's 
classic work on the synthesis of strychnine in 
the 1940s: if you know what happens at each 
step, you know what the end result looks like. 
That motive has vanished, however, thanks 
to advances in structural analysis, espe- 
cially crystallography and nuclear magnetic 
resonance spectroscopy. 

Another reason that chemists synthesized 
natural products was because of their useful 
properties. Molecules could be cheaper to 
make from scratch than to extract painstak- 
ingly from rare organisms. The total synthe- 
sis of the dye indigo in the 1870s that led to 
the collapse of the cultivation of the indigo 
plant is a canonical historical example. 

Today, most wholly synthetic routes to 
complex natural products are too compli- 
cated to be useful in themselves to the phar- 
maceutical industry. Even the celebrated 
total synthesis of paclitaxel in 1994 was never 
seriously expected to lead to a commercial 
route (it is now made semi-synthetically 
from a natural precursor, or by fermenta- 
tion). But total synthesis of a natural prod- 
uct can give chemists access to non-natural 
derivatives that might have pharmacological 
effects — as, for example, in the discovery of 
new antibiotics. 

What’s more, the grounding in synthetic 
chemical methods provided by making a 
complex natural molecule from scratch 
is said to equip students with the practi- 
cal skills that industry requires. Synthe- 
sis also cultivates an understanding of the 
basic principles of chemistry: how and why 
reactions occur, the relationships between 
molecular shape and function, and so on. 
An ability to synthesize molecules remains 
essential training for the next generation of 
chemists; it is simply part of the indispensa- 
ble core of the subject. By the same token, a 
lack of drawing skill does not make an artist 
bad but it makes them limited. 

Perhaps that’s why chemists with 
synthesizing skills are often said to get jobs 
in the pharmaceutical industry most easily. 
What is less clear is whether these skills can 
be learnt only by tackling fiendishly com- 
plicated structures. Indeed, Derek Lowe at 
Vertex Pharmaceuticals in Boston, Massa- 
chusetts, argues that drug companies value 
not the synthetic prowess per se but the con- 
comitant ability to solve problems fast — and 
to cope with the inevitable disappointments, 
because most drugs, like most organic reac- 
tions, do not work without a lot of tinkering. 

George Whitesides at Harvard University 
in Cambridge, Massachusetts, raises a dif- 
ferent concern. He worries that training US 
graduate students to do organic synthesis 
when most of it is now being done in China, 
risks equipping them for jobs that do not 
exist. In this view, molecule-building is just 


another kind of manufacturing technology: 
if it can be done more cheaply elsewhere, 
it is best not even to try to compete, just to 
outsource. 

In any case, the utility of resulting skills 
and products is only part of the argument 
advanced for why chemical synthesis matters. 
Great synthetic chemists of the mid-to-late 
twentieth century, such as Robert Woodward 
and Elias Corey, are revered not so much for 
what they made but for how they made it: 
for the way they refined the art. Woodward 
argued’ that an innate aesthetic appeal is 
involved: “The unique challenge which 
chemical synthesis provides for the creative 
imagination and the skilled hands ensures 
that it will endure as long as men write books, 
paint pictures, and fashion things which are 
beautiful, or practical, or both” 

These notions are part of the lore of the 
field. Milestones of synthesis are recounted 
in heroic terms, their pathways examined 
step by step as exemplars of elegant strategy. 
The comparison is often made with games 
of chess: victory is seen as a triumph of per- 
sonal style and flair. One team of expert total 
synthesizers has more recently justified the 
pursuit by saying” that 


it “demands the fol- “Like 

lowing virtues from, architecture, 
and cultivates the best chemistry 

in, those who practice dealsin 

it: ingenuity, artistic elegance 
taste, experimental jinboth 

skill, persistence, and design and 


character... its dual 
nature as precise sci- 
ence and fine art provides excitement and 
rewards of rare heights”. The baroque car- 
bon frameworks that still grace the pages of 
chemistry journals are often presented with 
a virtuoso flourish. 


execution.” 


BUILD IT WELL 
Nonetheless some chemists feel that total 
synthesis of large and complicated natural 
products has now become a scaling of peaks 
just because they are there — with, more- 
over, a meaningless race to the summit that is 
often won by brute force. Lowe calls this the 
“human-wave-attack style” of making gigan- 
tic natural products, which, he jokes, ends in 
papers reporting the total synthesis of a mol- 
ecule that no one much cares about, “made 
in a way you d figure would probably work, 
using reactions everyone already knows”. 
He contends that useful chemistry — a 
new method of making bonds, say — is 
rarely discovered along the way, partly 
because the field is so competitive. No one 
is going to dawdle to search for clever short- 
cuts if they can just follow tried and tested 
paths. When some enormous and intricate 
natural product becomes the next Everest, 
elegance is sacrificed for speed, and ingenu- 
ity for graduate-student hours, Lowe says. 
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Advocates of total synthesis retort that 
priority races and showboating — who can 
make the hardest molecule fastest — are 
less common now. The aim is no longer just 
to build the desired structure but to build 
it well. For example, chemists seek a route 
that is economical in atoms (producing few 
waste products and side reactions), environ- 
mentally friendly and sustainable. As Steven 
Ley of the University of Cambridge, UK, put 
it in 2007 after completing a 22-year effort 
to synthesize the complicated natural insec- 
ticide azadirachtin, “I don’t have to be first; 
the elegance of the approach is what interests 
me’ (see Nature 448, 630-631; 2007). 

Thanks to the efforts of the giants of syn- 
thesis past and present, almost any molecule 
can now be made in principle. The question 
is whether it can be made in a practical and 
fruitful way. 


COLLECTIVE COMPLEXITY 

To some chemists then, making complex 
molecules for their own sake no longer 
seems the pinnacle of craft. That arguably 
reflects changes in the objectives of chem- 
istry as a whole. Whitesides has suggested® 
that if chemistry is regarded as a science of 
atoms and individual molecules, then its 
low-hanging fruits are gone. The future of 
chemistry, according to him, lies with com- 
plex molecular systems that display collec- 
tive properties and functions at a range of 
size scales. This may be the only means by 
which chemistry can fulfil its obligations in 
areas ranging from medicine to materials, 
energy and information. 

Take the much bewailed drying-up of the 
drugs pipeline. Although the reasons are 
complicated, one factor could be that the 
old model of developing and refining a single 
drug molecule by a long process of screen- 
ing and clinical trials is no longer the best 
option. The future of molecular medicine 
might instead include suites of molecules 
performing operations in concert, as bio- 
molecules do in the cell. This, after all, is how 
the transformative gene-editing technique of 
CRISPR-Cas9 works. 

Moreover, the complexity and versatil- 
ity of life’s molecules come not from a huge 
array of synthetic substrates and reactions, 
but from combinations of a rather small 
set of parts, assembled through a limited 
arsenal of bond-forming processes and 
guided by natural selection. Certainly, natu- 
ral products of extreme intricacy can result. 
But theoretical and experimental surveys of 
‘chemical space’ — the astronomical array of 
possible molecules — give no reason to think 
that ornate solutions are essential or unique. 

Complicated natural products with 
synthetically challenging frameworks do 
not tend to feature in nature’s methods of 
making or transforming energy, replicat- 
ing, information processing, locomotion 


or much else. Work like that of David Liu 
and collaborators at Harvard’ shows that 
nature’s synthetic principles of informa- 
tion-guided templating coupled to varia- 
tion and selection might be a productive 
way to make useful synthetic molecules. In 
fact, that approach has also yielded new ways 
of assembling them*: new bond-forming 
chemistry, which was found by explicitly 
looking for it and not by hoping that it would 
emerge in the course of scaling a molecular 
Eiger. Such work suggests that, even though 
molecule-building is sure to remain a crucial 
part of the chemical enterprise, conventional 
organic synthesis need not be the only, or 
even the best, way to do it. 


AUTOMATING THE ART 

One of the common criticisms of total 
synthesis is that it rarely offers a route that 
the chemical or pharmaceutical industries 
can use: it takes too long, there are too many 
steps, the yields are too low and the costs 
too high. If you want to make a complicated 
molecule, do you really need an army of ded- 
icated graduate students working through 
the night? Or could it be done by machine? 

Automated synthesis is already possible 
for peptides and nucleic acids, which can be 
obtained by mail order with essentially any 
sequence. Oligosaccharides are also yielding 
to this approach. As a result we have lucra- 
tive peptide and oligonucleotide drugs, and 
glycoprotein drugs are on the way. Work’ by 
Martin Burke at the University of Illinois at 
Urbana-Champaign suggests that a great 
variety of small and medium-sized organic 
molecules could be made this way too. 

Burke uses a single, general-purpose 
reaction to assemble carbon-framework 
building blocks. He deploys the Suzuki cou- 
pling, in which a boronic acid substituent on 
one carbon reacts with a halogen substituent 
on the other in the presence of a palladium 
catalyst. The crucial trick is first to control 
this process for stepwise assembly’®, and 
then to automate the procedure by trapping 
the products of each step on silica beads to 
extract and release them for the next step. It 
is not by any means possible to build every- 
thing this way. But the method gives access 
to an impressive array of molecules rapidly 
and cheaply at the push of a button. Burke 
and his colleagues have used it to make less 
toxic derivatives of the antifungal natural 
product amphotericin B. 

Automation is nothing new. Microfluidic 
flow processes for conducting multistep syn- 
theses without the need for purification at 
each step have been used for at least a dec- 
ade. And with a small repertoire of standard, 
reliable bond-forming reactions, even the 
synthetic strategy itself could conceivably 
now be planned by machine. 

The idea that synthesis could become the 
workaday cranking-out of any structure is 


disturbing to anyone brought up to regard 
it as an art. It seems akin to the notion that 
artificial intelligence will one day compose 
our music and write our novels. But the ‘art’ 
of chess has been overtaken by brute-force 
number-crunching. There is no fundamen- 
tal reason why chemical synthesis should be 
any different — nor, in fact, why machine- 
learning should not one day find superior, 
smoother and more efficient synthetic strat- 
egies than we can intuit (see Nature 512, 
20-22; 2014). 

Ifthat happens, some magic would be lost. 
But there could be practical gains. Today 
we need to make many molecules fast, to 
outpace the rise of antibiotic resistance, for 
example. This is acknowledged by the Dial- 
a-Molecule project, funded since 2010 by 
the UK Engineering and Physical Sciences 
Research Council, which aims to extend the 
assembly-line principle of oligonucleotide 
synthesis to any small organic molecule. 

The project's vision is that “In 20-40 years, 
scientists will be able to deliver any desired 
molecule within a timeframe useful to the 
end-user, using safe, economically viable 
and sustainable processes” (see www.dial- 
a-molecule.org). It aims to use computer 
algorithms to devise the best route for mak- 
ing a target molecule with a suite of ‘click 
reactions, which are efficient, predictable 
and dependable. The goal is to make any 
given molecule in a matter of days. 

Easier synthesis could free chemists to 
think creatively about molecular design: to 
focus on the question of what is worth mak- 
ing. That is currently the other big obstacle to 
effective drug discovery. As Burke explains, 
we do not yet know the rules that nature uses 
to ‘design’ complex natural products, in large 
part “because the process of trial and error 
in this complex chemical space is very slow 
due to barriers to synthesis”. 


HUMAN ENDEAVOUR 

Chemistry, then, shares a great deal 
with conventional manufacturing: it 
changes through innovations in design 
and fabrication. We don't make cars or 


televisions the way we used to, so why should 
molecules be any different? We need to avoid 
romanticizing an imagined bygone age, as 
the designer William Morris harkened back 
to the folk crafts of a fictitious Middle Ages. 

Better than making molecules more 
complicated or larger is making them more 
useful, and making them in more useful 
ways. Like architecture, chemistry deals 
in elegance in both design and execution. 
There has not been enough discussion of 
these aspects of the science: how they are 
manifested, how they motivate, how much 
they are worth conserving. 

In contemplating automated synthesis, for 
example, a comparison from mathematics 
comes to mind. There is debate over whether 
a mathematical proof should be celebrated 
for its own sake, regardless of method, or for 
its elegance and form — how it was done. 
Does ‘proof by machine’ count? Such ques- 
tions go to the heart of science as a human 
endeavour. We tell ourselves that the goals 
are knowledge and capability. But there are 
other things we value in it too. m 


Philip Ball is a freelance writer. His latest 
book is Invisible: The Dangerous Allure of 
the Unseen. 

e-mail: p.ball@btinternet.com 
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In Medellin, Colombia, water tanks are being repurposed to create public spaces that offer classrooms, cafes and theatres. 


URBAN STUDIES 


Blueprint for a cooperative cit 


Colin Ellard examines a study of the new urban paradigm that fosters ‘deep sharing’. 


s urbanizing countries grapple 
Az the need to provide sustain- 

able energy and transport for their 
burgeoning cities, start-up companies are 
creating a culture and economy of sharing. 
Many are commercial. The global home- 
rental ‘community’ Airbnb, for instance, has 
an estimated 60 million users in 34,000 cities. 
US-based transport company Uber, which 
links registered drivers with passengers by 
way of smartphones, is active in more than 
360 cities across 6 continents. Car-sharing 
services such as Zipcar are also widespread, 
attracting millennials who blanch at the 
costs of car ownership (environmental as 
well as financial). 

Commercially mediated sharing can have 
a dark side. Sharing can skew local econo- 
mies. Property owners turning to Airbnb 
may convert entire buildings to de facto 
hotels in cities such as New York, potentially 
contributing to housing crises. And Uber 
uses a surge-pricing algorithm to match 
supply and demand, meaning that users can 
face unpredictably high fares during periods 
of peak demand. 

There is an alternative: bottom-up 
ventures that are digital or based in commu- 
nities, rather than commercial. In Sharing 
Cities, environmental consultant Duncan 
McLaren and urban-policy scholar Julian 


Agyeman lay out, with impressive depth, 
clarity and wisdom, a comprehensive 
prescription for a sharing paradigm that 
incorporates such models. Noting that shar- 
ing has been a sociocultural and informal 
practice for millennia, McLaren and Agye- 
man also reveal the promise and pitfalls of 
such an approach at a time when neoliberal 
economic policies emphasizing individu- 
alistic profit often trump public goods and 
services. 

Sharing Cities explores the potential in 
dense urban spaces for ‘deep sharing’ of 
goods, resources, services, talent and expe- 
rience through the Internet, with its rapid, 
extensive linking of 
lenders and borrow- 
ers. The models that 
the authors exam- SS 
ine include barter 
clubs, credit unions, 
cooperative land 
trusts and co-housing 
to online and other 
peer-to-peer (P2P) 


Cities 


Sharing 


networks and supper ppsens ena 
F ase for Truly 
clubs. As they point Ametanee 


out, commercial ser- 
vices barely scratch 
the surface of what 
is possible in a true 


Sustainable Cities 
DUNCAN MCLAREN 
AND JULIAN AGYEMAN 
MIT Press: 2016. 
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sharing economy. For example, decentral- 
ized P2P networks such as TaskRabbit — in 
which users can exchange skills and services 
without strong corporate oversight — can 
facilitate substantial sharing networks with 
minimal supervision. Public-transport sys- 
tems can be considered a form of sharing, 
because the costs of mobility are shared 
between many. 

Each chapter focuses on a particular 
aspect of sharing (production, consumption, 
politics, justice), and opens with a vignette 
of a city that exemplifies it. San Francisco, 
California — a hotbed of entrepreneurial 
start-up culture driven in part by Silicon Val- 
ley — is used to illustrate consumption. That 
kick-starts a discussion of open skills and 
knowledge sharing on online collaborative 
platforms that fly in the face of conventional 
commercial secrecy. 

Medellin, Colombia, is used as an exam- 
ple of sharing in the context of social jus- 
tice, as a result of the city’s spectacularly 
successful overturning of social margin- 
alization over the past decade. This has 
been achieved through an ongoing archi- 
tectural transforma- 
tion of water tanks 
into shared public 
spaces, as well as the 
introduction of its 


Visit our blog on 
science in culture: 


= 
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uo 


sustainable Metroplus bus rapid transit 
system. McLaren and Agyeman describe 
how other cities can foster inclusivity and 
sharing through prudent adjustments in 
policy and priorities, provision of open 
data and more thoroughgoing input from 
citizens at the grass-roots level. 

As I worked my way through each chap- 
ter, I rode a crest of optimism about the 
imminence of real change, only to crash 
back to reality as I realized how difficult it is 
to ensure that sharing transformations are 
transparent, equitable and just. The authors 
never flinch from tackling the complexities 
and contradictions inherent in these exam- 
ples. They present exquisitely balanced 
explanations of both the potential of sharing 
and its vulnerability to corruption by oppor- 
tunistic invaders seeking to maximize profit 
over fairness. 

In many cases, as McLaren and Agye- 
man show, overcoming conflicts between 
bottom-up and profit-driven sharing ven- 
tures demands reconfiguration of urban 
policy. Two examples of this are partici- 
patory budgeting, in which citizens share 
responsibility for allocating resources, and 
shared land ownership, which emphasizes a 
public commons. Both deter the exclusions 
often generated by gentrification. 

My only criticism is with one of the book’s 
key premises: that humans are evolutionar- 
ily predisposed to share across the board. 
The authors point to work in developmental 
psychology showing that babies are aware 
of fairness and injustice (M. F. H. Schmidt 
and J. A. Sommerville PLoS ONE 6, e23223; 
2011). Yet there is no shortage of evidence 
in evolutionary psychology — and everyday 
life — for the human tendency towards self- 
ishness under some circumstances, towards 
some classes of others. And theoretical work 
has suggested that under many conditions 
common in human society, cooperation is 
likely to collapse (A. J. Stewart and J. B. Plot- 
kin Proc. Natl Acad. Sci. USA 111, 17558- 
17563; 2014). Indeed, even the cited work by 
Schmidt and Sommerville shows that more 
than one-third of the infants in the study 
kept the best ‘loot’ for themselves. 

In part, such differences are surely what 
underlie the constant push-pull between 
new sharing paradigms and the ventures 
that co-opt and parasitize them. It would 
have helped the balance of McLaren and 
Agyeman’s argument to describe some of 
the seamy underbelly of our evolutionary 
heritage as well as the rosier side of our 
natures. 


Colin Ellard is a cognitive neuroscientist 

at the University of Waterloo in Canada, 
specializing in the study of the relationship 
between human psychology and urban 
design. His latest book is Places of the Heart. 
e-mail: cellard@uwaterloo.ca 


Books in brief 


A Natural History of Wine 

lan Tattersall and Rob DeSalle YALE UNIVERSITY PRESS (2015) 

Was science ever more intoxicating? This sparkling contribution 

to the science of wine by palaeoanthropologist lan Tattersall and 
entomologist Rob DeSalle draws on a staggering array of disciplines, 
from neurobiology to physics. Starting at the putative cradle of wine- 
making — an Armenian cave containing a 6,000-year-old proto- 
winery — the two trawl the research on frugivorous higher primates’ 
putative hankering for fermented fruit; the bodily journey of a 
“wine-derived ethanol molecule”; and the impact of climate change 
on cultivation (J. Goode Nature 492, 351-353; 2012). 


White Eskimo: Knud Rasmussen’s Fearless Journey into the Heart 
of the Arctic 

Stephen R. Bown DA Capo (2015) 

The part-Inuit, part-Danish explorer Knud Rasmussen is famed 

for his 32,000-kilometre Fifth Thule Expedition (1921-24) from 
Hudson Bay to Alaska. But as Stephen Bown reveals in this masterful 
biography, he was also an Arctic Richard Francis Burton, publishing 
key anthropological works on Inuit culture in Canada and Greenland. 
Ultimately, Bown shows, Rasmussen became a scientist-bohemian 
“as comfortable in bearskin pants on a featureless wind-lashed plain 
as he was in a formal suit and bow tie attending the opera”. 


Slick Water: Fracking and One Insider’s Stand against the World’s 
Most Powerful Industry 

Andrew Nikiforuk GREYSTONE (2015) 

This meticulously researched study by journalist Andrew Nikiforuk 
lifts the lid on the costs of that vast geological-engineering experiment, 
fracking. It centres on Canadian environmental impact assessor 
Jessica Ernst, who in 2005 found explosive levels of methane in her 
well water, fingered the culprit as fracking and launched a legal battle. 
Interwoven with her story is a deft history of fracking from the 1850s 
(when torpedoes and nitroglycerin were used) through the 1960s 
(nuclear explosions) to modern hydraulic fracturing. 


The Hidden Half of Nature: The Microbial Roots of Life and Health 
David R. Montgomery and Anne Biklé W. W. NoRTON (2015) 

Soils and the human gut teem with microbes, and both communities 
need care and feeding to support, respectively, nutrient-rich crops 
and healthy immune systems. So emphasize geologist David 
Montgomery and biologist Anne Biklé in this beautifully synthesized 
scientific memoir. Personal experiences — revitalizing degraded 
garden soil and surviving a major health scare — become ways into 
swathes of cutting-edge research in microbiology, from agronomist 
Lorenz Hiltner’s work on “disease suppressive” soils to the Human 
Microbiome Project (see go.nature.com/tsty3t). 


The Snowflake: Winter’s Frozen Artistry 

Kenneth Libbrecht and Rachel Wing VOYAGEUR (2015) 

In 2003, physicist Kenneth Libbrecht J. Hoffman Nature 480, 
453-454; 2011) published the first edition of this aesthetic and 
scientific celebration of the snowflake. With park ranger Rachel 
Wing, Libbrecht returns with fresh research, more advanced 
microphotographs and a history of snowflake imaging from Robert 
Hooke’s 1665 drawings to Wilson Bentley’s photographs, taken 
between 1885 and 1931. A gallery of jewels — Antarctic ‘diamond 
dust’, rococo stellar dendrites and beyond. Barbara Kiser 
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SCIENCE COMMUNICATION 


In the cross hairs of controversy 


Nancy Baron reviews a handbook for scientists keen to influence policy. 


cation, academia has its own scar- 

let letter: A, for Advocacy. Many 
scientists shudder at the thought of 
being branded advocates. As a result, 
they can undermine the message 
of their research by caveating every 
assertion, or even avoiding inter- 
action with the public — something 
I have encountered many times as a 
science-communication coach. 

Lee Badgett is an academic who has 
fought with that tension and come 
out swinging. A professor of econom- 
ics and policy, she is a veteran analyst 
and public intellectual with decades 
of experience in policy debates about 
equality for lesbian, gay, bisexual and 
transgender (LGBT) people. Her pithy 
Twitter profile describes her life's work 
as “Studying LGBT economic inequality to 
figure out how to end it” The Public Professor, 
her third book, is an exhortation to scientists 
to become “activist-scholars” like her. 

Badgett intends to reverse-engineer 
advocate-academics to teach others how to 
galvanize policymakers, the media and the 
community to pay attention to their research. 
Her prescription for confronting injustice 
in areas from civil rights to climate change 
entails “injecting scholarship into important 
debates, taking advantage of good timing, 
being willing to handle disagreement” and 
connecting with the public, activists and 
policymakers. 

Badgett breezes past reasons not to engage, 
averring that as for bad news about advocacy, 
“There really isn’t any.” As someone who 
entered the fray as a union organizer during 
her graduate degree, she draws on her own 
experience to offer strategies for researchers 
to inform legislative chambers, courtrooms, 
businesses and social movements. 

She recommends three steps for maximum 
impact. First, examine the big picture, under- 
stand the debate and master the rules of the 
game, including determining your own role 
in a conversation. If an economist wants to 
recommend changes to the minimum wage, 
for example, understanding the institutions 
that regulate wage policy is essential. Identify- 
ing the decision-makers, as well as what they 
need and when, is crucial in working out how 
to make your research relevant and timely. 

Second, build a network in the social 
spheres you hope to influence. Badgett 
advises using your existing network for e-mail 
introductions, or finding a hook to make a 


I: the world of science communi- 


Climate scientist and activist James Hansen. 


cold call. A lawmaker who has introduced 
legislation may be keen to hear about how 
your research is relevant to it. 

Third, practise the art of communicating 
with people outside your sphere. Prepare an 
elevator speech: what would you say to a US 
Congress member if you were in a lift with 
them for 30 seconds? Legislators are notori- 
ous for their short attention spans. The key is 
to distil your insights in a way that highlights 
their importance to the targeted person. 

The payback for all such efforts, Badgett 
rightly notes, is that they generate feedback 
and ideas that can inform future research 
questions and improve your teaching. 

Badgett advises academics who work 
on hot-button issues such as gay marriage, 
minimum wage or climate change to learn to 
manage conflict by seeing what lies behind 
it. Often it is politics, not science. The cost 
of avoidance, she writes, “is allowing others 
to dictate the debate and public outcome”. 
Nor, she says, is being in the thick of the 
argument as difficult as neophytes imag- 
ine. In an intriguing section, “Developing a 
Thicker Skin’, she writes: “for some scholars 
who havent yet dipped their toe into the sea 
of engagement, once you are all the way in, 
you'll get used to the temperature”. The best 
defence against attacks is to live and work 

by basic, ethical prin- 


The Public ciples, she advises. 
Professor: How to Her discussion of 
Use YourResearch “cu stainable engage- 
to Change the ment” in the lon 
World 8 
M.V. LEE BADGETT term may prove 
New York University especially helpful to 
Press: 2015. those with fears of a 
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career-crushing backlash from going 
public. Badgett suggests addressing 
those concerns head on, and sooner 
rather than later. She does not fear the 
potential pitfalls, including obstacles 
to getting tenure, that can be a concern 
outside the social sciences. In ecology, 
for instance, public engagement can 
be viewed as a distraction from pub- 
lishing — perhaps a different reaction 
from that in sociology or economics. 

Badgett’s tenure portfolio included 
a note from Barney Frank, long-time 
Massachusetts congressional rep- 
resentative, thanking her for send- 
ing him an article that he entered 
into the Congressional Record. She 
recommends building a network of 
academics who can vouch for your 
contributions to public debates and 
convince a tenure committee that they are 
worthy of credit. I have seen this strategy 
work, but it is patchy and dependent on lead- 
ership within departments and institutions. 

I enjoyed reading about Badgett’s experi- 
ence, and would have welcomed more on 
lessons she has learned in the cross hairs. Her 
book skates over the surface of a large pond 
and sometimes feels on thin ice, with too few 
in-depth examples. Nor does it reference what 
I consider core reading, such as Cornelia 
Dean's Am I Making Myself Clear? (Harvard 
Univ. Press, 2012), an elegant book on the 
fundamentals of talking to journalists; Den- 
nis Meredith’s Explaining Research (Oxford 
Univ. Press, 2010), which is like having a top- 
notch public-information officer assigned to 
you; and Randy Olson’s Houston, We Have 
a Narrative (Univ. Chicago Press, 2015; see 
Nature 526, 321; 2015), an astute take on how 
to make science resonate through storytelling. 

The Public Professor pushes the bounda- 
ries for scientists thinking of taking the 
public plunge. It will also be instructive to 
the more restrained scientist, as defined by 
Roger Pielke in The Honest Broker (Cam- 
bridge Univ. Press, 2007; A. A. Rosenburg 
Nature 448, 867; 2007). Researchers may feel 
that their fields are more constrained than 
the social-justice issues that Badgett cham- 
pions, but The Public Professor has much to 
offer by exploring what is possible for those 
who want to change the world. = 


Nancy Baron is director of science outreach for 
communication-training service COMPASS 
and author of Escape from the Ivory Tower. 
e-mail: nbaron@compassonline.org 
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Archives and citation 
miss equal authors 


It is now common practice to 
include ‘equal contributions’ 
footnotes in papers that have 
multiple first or senior authors. 
Unfortunately, this information 
is not preserved by the archiving 
and citation processes. The 
omission diminishes the roles 
of equal but subsequently listed 
authors, discouraging scientists 
from working in collaborations 
and teams — the backbone of 
modern scientific progress. 

Equal-authorship details can 
currently be found only in the 
papers themselves. These details 
are not available on indexing 
sites or in referenced citations, 
which are increasingly the 
main source of information 
for literature searches. 

To rectify this oversight, 
indexers need publishers to code 
author-status information ina 
standard format. For example, 
journals could include an 
asterisk beside authors’ names 
to indicate equal contributions 
for article-citation purposes. We 
call on all journals and indexers 
such as PubMed, Google Scholar 
and Thomson Reuters Web of 
Science to update their systems 
to reflect shared authorship. 
Brian D. Brown, Miriam Merad 
Icahn School of Medicine at 
Mount Sinai, New York, USA. 
brian.brown@mssm.edu 


Design buildings for 
rapid evacuation 


In today’s terrorism-prone 
world (see, for example, Nature 
528, 7-8; 2015 and Nature 528, 
20-21; 2015), it is becoming 
increasingly important to ensure 
that buildings are designed to 
be speedily evacuated in an 
emergency. 

Evacuation modelling is 
a relatively new field that 
uses computational tools to 
predict human behaviour in a 
stricken building. Algorithms 
represent the range of people's 
possible reactions in the event 


of such a disaster (see, for 
example, E. D. Kuligowski 

et al. US National Institute of 
Standards and Technology 
Technical Note 1680; 2010). 
Models provide information on 
optimal evacuation strategies 
and allow buildings to be tested 
using real and hypothetical 
evacuation scenarios. 

Making evacuation modelling 
mandatory in the design and 
assessment of existing and 
planned buildings that could 
be at risk would minimize the 
impact of attacks on occupants. 
Enrico Ronchi Lund University, 
Sweden. 
enrico.ronchi@brand.Ith.se 


Common doctorates 
across Europe 


The German medical doctorate 
system is not the only element 
that needs changing to overcome 
the ills that you discuss (see 
Nature 527, 7; 2015). In our 
view, a European approach 
offers the best cure. 

We suggest that Germany's 
medical degree should be 
modified to lead to a common 
European medical qualification: 
the vocational degree of Doctor 
of Medicine. Postgraduate 
medical research should be part 
of a different common European 
qualification: the academic 
degree of Doctor of Philosophy. 

Scientific quality would be 
guaranteed if the criteria for 
attaining degrees were to be 
standardized across Europe, 
and if specialist postgraduate 
medical colleges were widely 
set up. Students and clinicians 
would then also be able to 
pursue their divergent scientific 
interests more easily. 

A European core curriculum 
devised along these lines would 
reduce excessive pressure on 
students, enhance the mobility 
of students and graduates, and 
foster the growth of excellent 
health care and science. The 
International Federation of 
Medical Students’ Associations 
and the European Medical 


Students’ Association have 
already laid the foundations for 
such a curriculum (see J. Hilgers 
et al. Med. Teach. 29, 270-275; 
2007). 

Stefan U. Hardt, Jannis 
Papazoglou, Benedikt W. Pelzer 
European Medical Students’ 
Association, Brussels, Belgium. 
pmo@emsa-europe.eu 


Clean energy enters 
virtuous cycle 


Governments promised on 

30 November to almost double 
global funding for clean-energy 
research (see go.nature.com/ 
n4qdsw). Meanwhile, the very 
act of deploying emissions- 
cutting technologies to meet 
countries’ climate pledges at the 
recent United Nations summit 
in Paris is likely to spur major 
innovation. 

Such technological advances 
mean that cutting emissions can 
drive down the cost of further 
cuts in emissions (see go.nature. 
com/j8ueaj). For example, the 
price of photovoltaic modules 
for solar energy has fallen by 
85% since 2000 as markets have 
grown; electricity costs from 
wind are now comparable to 
those from coal; and energy- 
storage technologies are 
improving. 

Publicly funded research and 
development, early investment 
by the private sector, and efficient 
deployment are all crucial for 
innovation. Market growth 
in renewable energy is largely 
driven by government policies, 
which have unleashed private 
companies’ research ingenuity 
and achieved economies of 
scale and greater productivity 
(see also J. E. Trancik Nature 507, 
300-302; 2014). 

Recognizing the mutual 
reinforcement of cutting 
emissions and improving clean 
energy is essential for negotiating 
a long-term, ambitious climate 
deal. As global efforts add up, 
falling costs should allow for an 
international agreement to phase 
in emissions cuts at a rate that 


matches each nation’s stage of 
economic development. 

Jessika E. Trancik Massachusetts 
Institute of Technology, 
Cambridge, USA. 
trancik@mit.edu 


Crowdfunded trials 
doubly scrutinized 


We disagree that crowdfunding 
of clinical trials is ethically 
questionable (P. Y. Cheah Nature 
527, 446; 2015). Participants 

are still governed by the same 
high standards of research 
integrity as traditionally funded 
recipients — but with the added 
scrutiny that comes with public 
engagement (see, for example, 
N. Siva Lancet 384, 1085-1086; 
2014). 

Cheah criticizes crowdfunding 
of clinical trials because it risks 
backing studies that are of limited 
importance and applicability. 
However, it is this very feature 
that offers an opportunity to fund 
trials for rare or emerging tropical 
diseases that might not otherwise 
attract financial support (see, for 
example, T. S. van der Werf et al. 
Bull. World Health Organ. 83, 
785-791; 2005). 

David Hawkes University of 
Melbourne; and Florey Institute of 
Neuroscience and Mental Health, 
Victoria, Australia. 

Melanie Thomson Deakin 
University, Victoria, Australia. 
m.thomson@deakin.edu.au 


CORRECTION 

The Outlook article ‘Research 
without prejudice’ (Nature 525, 
$12-S13; 2015) incorrectly 
stated that the approval of the 
US National Institute on Drug 
Abuse (NIDA) is required for 

US cannabis trials. In fact, NIDA 
provides cannabis for every 
project that has completed 

the government-mandated 
approval process. The article 
also implied that NIDA was 
holding up the start of a trial led 
by Sue Sisley, but the delay is 
caused by other circumstances. 
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Recovery as nitrogen declines 


Pollution from atmospheric nitrogen deposition is a major threat to biodiversity. The 160-year-old Park Grass experiment 
has uniquely documented this threat and demonstrated how nitrogen reductions lead to recovery. SEE LETTER P.401 


DAVID TILMAN & FOREST ISBELL 


Ithough greater availability of 
A: scarce nutrient might seem 

beneficial for all plant species, 
this is not so. Even in seemingly pris- 
tine and protected ecosystems, large 
losses of plant diversity can be caused 
by the addition ofa nutrient that limits 
plant growth’. One documented exam- 
ple is the effect of deposition on land 
of nitrogen that was released into the 
atmosphere by fossil-fuel combustion 
and agriculture. However, it has been 
unclear whether plant diversity will 
recover when nitrogen emissions are 
reduced or whether additional resto- 
ration practices are required. On page 
401 of this issue, Storkey et al.’ use the 
unparalleled long-term data of the Park 
Grass experiment to show that plant 
diversity recovers as nitrogen deposi- 
tion decreases. 

The Park Grass experiment at 
Rothamsted Research in Harpenden, 
UK, was started in 1856 and is the 
longest-running study of grassland in 
the world (Fig. 1). By comparing fer- 
tilized and control (never fertilized) 
plots, the authors observed that plant 
diversity declined to about 30% of its 
original level during 135 years of nitro- 
gen fertilization, but returned to about 
70% of its original level two decades 
after fertilization was halted. Moreo- 
ver, plant diversity declined in unfer- 
tilized control plots to about 50% of 
its original level as atmospheric nitro- 
gen deposition increased from 1950 to 1985. 
Then, when the introduction of cleaner tech- 
nologies greatly decreased nitrogen deposition 
from 1985 to 2012, plant diversity increased to 
about 80% of its original level. In both recover- 
ies, plant communities tended to regain their 
former species compositions. 

These observations contrast with results 
of a grassland experiment in Minnesota in 
which little, if any, recovery had occurred 
two decades after cessation of high rates of 
nitrogen fertilization*. Storkey and collabo- 
rators suggest the intriguing possibility that 
this difference is due to the fact that the Park 


Figure 1 | The Park Grass experiment. This field experiment in 
Hertfordshire, UK, has been running since 1856. Its division of 
plants into control or treated plots has been used to test the effects 
of various interventions on agricultural productivity, such as 
fertilization and altered soil pH. Storkey et al.’ used data from the 
experiment to document declines in plant biodiversity in response 
to nitrogen accumulation, but also found that diversity recovers as 
nitrogen levels decrease. 


Grass plots have been hayed (the grass cut, 
dried and removed) twice each year since the 
experiment started, whereas the Minnesota 
plots were never hayed. Why might this 
matter? Haying removes biomass and its nitro- 
gen. If not removed, excess nitrogen that had 
accumulated in an ecosystem would recycle 
within that system, thereby retaining its eco- 
logical impacts long after nitrogen addition 
slowed or ceased. 

Park Grass hay contains 1.5-2% nitrogen (see 
Fig. 2 of the paper’), and so annual removal of 
around 2 and 5 tonnes per hectare of hay from 
the control and fertilized plots, respectively, 
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probably removed around 35 and 90 
kilograms of nitrogen per hectare per 
year. This removal occurred alongside 
a reduction in nitrogen deposition of 
around 25kgha ‘yr ‘from its peak, for 
the control plots, and of an additional 
reduction of 96 kgha yr“ for the plots 
that stopped receiving fertilizer. The 
removals of nitrogen through haying 
possibly hastened the plots’ recovery. 
The reason why increased availability 
of limiting nutrients can cause biodiver- 
sity losses lies in the evolutionary trade- 
offs that cause species to be specialists. 
Adaptations that increase the ability of 
a given plant species to compete for one 
limiting resource come at a cost to the 
species’ capacity to deal with limitation 
by another resource or factor’. Thus, an 
increased supply of one resource should 
lead to the competitive displacement of 
those species that are superior competi- 
tors for the enriched resource, because 
they are among the poorer competitors 
for the new limiting factor. In theory, 
if enrichment led to accumulation ofa 
formerly limiting nutrient, both cessa- 
tion of its addition and decreases in its 
stores would be needed for that nutrient 
to again become limiting and for bio- 
diversity to begin recovering. 
Although further tests are needed 
to confirm that haying helped the 
Park Grass plots’ recovery, the idea is 
supported by several other examples. 
Human activities release more avail- 
able nitrogen and phosphorus than 
all natural terrestrial processes com- 
bined**, and accumulation of these nutrients 
can cause dramatic shifts in species composi- 
tions and biodiversity in terrestrial and aquatic 
ecosystems. For instance, high-diversity heath- 
lands in the Netherlands and Germany were 
replaced by low-diversity grasslands as nitro- 
gen deposition reached higher rates than those 
in Britain’’®. Successful restoration of these 
heathlands often required physical removal 
of both vegetation and topsoil’. Similarly, 
40 square kilometres of vegetable farmland in 
south Florida became a virtual monoculture 
of Brazilian pepper trees after it became part 
of Everglades National Park and agriculture 


ROTHAMSTED RESEARCH 


ceased in 1975. Attempts to restore the pre- 
agricultural ecosystem were futile until both 
the invasive trees and the fertilized agricultural 
soil were removed'*"”. For phosphorus-lim- 
ited lake ecosystems, reduction of phosphorus 
inputs can be insufficient for lake recovery if 
excess phosphorus inputs from agriculture are 
retained and recycled”. 

These cases suggest that both reduction of 
nutrient inputs and removal of any large stores 
of accumulated nutrients may be required for 
restoration of native ecosystems. Some ter- 
restrial restorations also require liming to 
overcome soil acidification, and seed addition 
when formerly abundant plant species are 
absent™!!°. However, it is not yet clear how 
the magnitude of increases in nitrogen stores 
influences the recovery of grassland diversity 
after nitrogen addition decreases or ceases*””*. 

The insights from the Park Grass experi- 
ment, together with results from earlier stud- 
ies, show that biodiversity can recover even 
after chronic high rates of nutrient pollution, 
and suggest that this recovery may be hastened 
by, or perhaps require, management practices 
that reduce accumulated nutrient stores. 
Moreover, it suggests that haying, a much gen- 
tler practice than destructive removal of both 
vegetation and soil, may reduce nutrient stores 
sufficiently to allow grassland diversity to 
recover. Finally, Storkey and colleagues’ work 
demonstrates the great value that long-term 
studies can provide in identifying solutions to 
environmental problems. = 


David Tilman and Forest Isbell are in 

the Department of Ecology, Evolution and 
Behavior, University of Minnesota, St Paul, 
Minnesota 55108, USA. D.T. is also in the 
Bren School of Environmental Science and 
Management, University of California, 
Santa Barbara. 

e-mails: tilman@umn.edu; isbell@umn.edu 
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Entanglement beyond 
identical ions 


Control of quantum particles has been extended to enable different types of ion to 
be entangled — correlated in a non-classical way. This opens up opportunities for 
the development of new quantum technologies. SEE LETTERS P.380 & P.384 


TOBIAS SCHAETZ 


ntanglement is a peculiar phenomenon 
Be causes two or more particles to share 

one common state, such that each particle 
can no longer be described independently. In 
this issue, Tan et al.’ (page 380) and Ballance 
et al.” (page 384) report entangled pairs of ions 
consisting of two different atomic species — the 
first time that this has been achieved. They used 
the resulting systems to test the puzzling pre- 
dictions of quantum mechanics with unprec- 
edented accuracy. This in turn allowed them 
to benchmark trapped ions as an experimen- 
tal platform for quantum technology, and to 
assess the platform's prospects to further exploit 
quantum effects for applications such as atomic 
clocks and quantum computation. 

Quantum mechanics requires objects to be 
able to exist in two states simultaneously, even if 
the states are mutually exclusive. To picture such 
a superposition, imagine the magnetic needle 
of a hypothetical quantum compass pointing 
north and south at the same time. A measure- 
ment that determines the state of the needle will 
project it into one of its two possibilities at ran- 
dom — the result is not just unknown, but not 
determined before the measurement. 

If there are two quantum magnetic needles, 
they can become entangled. For entangled 
objects, a measurement on one object that 
produces a completely random output instan- 
taneously determines the potential result of the 
second object (or vice versa). The effect of the 
measurement is immediate and is independent 
of the distance between the objects. 

Einstein was one of the founding fathers of 
the theory of quantum mechanics, but he and 
his colleagues realized that the consequences 
of entanglement severely violate intuition and 
logical conclusions based on the classical inter- 
pretation of nature. Einstein and others there- 
fore proposed some seminal experiments’ that 
could be used to show that their theory was 
far from complete. But because the practical 


Figure 1 | Entangling two different ions. 
Individual ions can exist in one of two quantum 
‘spin states: spin up (1) and spin down (J). 
Quantum mechanics also allows ions to form 

a superposition state (1+) in which both the t 
and | states coexist. Tan et al.' and Ballance et al.’ 
have prepared entangled pairs of ions consisting 
of two different atomic types — either different 
elements or different isotopes of an element. Each 
ion seems to be ina 1+) state (shaded regions), but 
entanglement generates a correlated state (TT+1J, 
bounded by dashed lines; dotted lines indicate spin 
correlations), which means that a measurement 

of one of the two ions instantaneously affects 

the state of the other — that is, the two formerly 
independent ions have to be considered as a whole. 


prerequisites for the experiments seemed 
to exceed the capabilities of any researcher, 
even in the future, they called their proposals 
Gedankenexperimente (‘thought experiments). 

Tan et al. and Ballance et al. report that 
quantum mechanics is accurate even when 
non-identical objects are entangled. Tan and 
colleagues entangled a beryllium-9 ion ("Be*) 
and a magnesium-25 ion (*Mg*), whereas 
Ballance and co-workers used two isostopes 
of calcium, *°Ca* and “Ca*. To describe how 
both groups created entanglement, consider 
the ions in each pair as magnetic needles that 
can point in one of two directions. This behav- 
iour is analogous to that of a particle that has 
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a spin value of +% or —; discrete spin values 
are a quantum form of angular momentum. 
Applying appropriate optical fields generated 
by laser beams, or microwave fields, mediates a 
ferromagnetic interaction that aligns the spins. 
In other words, if the first ion is prepared and 
kept in the ‘northward-pointing’ spin-up (1) 
state, then the interaction puts the second spin 
into a state too. 

In a similar way, the authors prepared the 
first spin in a superposition state (7 +1) by 
switching off the spin-rotating microwave or 
laser fields after 90° of rotation. The research- 
ers then induced the ferromagnetic interaction 
described above. This orients the second spin 
into an entangled superposition state of ferro- 
magnetic order (TT+J1); the 7 part of the first 
ion’s superposition state rotates the second ion 
into 7, and the | part rotates the second ion into 
J (Fig. 1). The quantum nature of the created 
correlation became evident when the research- 
ers took measurements of only the first ion’s 
spin. The outcome was completely random but 
instantaneously determined the outcome of a 
subsequent measurement of the second spin — 
the outcome of the second measurement was 
almost always identical to that of the first. 

Some correlation of measurements of 
classical objects is possible, and this is poten- 
tially enhanced in the presence of unknown or 
hidden (but classical) variables. The maximal 
possible correlation by classical means can 
be derived mathematically in the form of an 
inequality, known as a Bell inequality. In the 
current experiments, the variety concerned is 
called the CHSH Bell inequality, and its upper 
bound for classically achievable correlations 
is 2. Entanglement requires quantum cor- 
relations that enable this upper bound to be 
exceeded — that is, the Bell inequality can be 
violated up to a maximum value of approxi- 
mately 2.828. When such violations are meas- 
ured experimentally, the results show that 
entanglement is necessary to describe nature. 

In 1982, the first experimental tests were 
done’, and demonstrated that entanglement 
does indeed seem to be necessary. Since then, 
any potential shortcomings in the experiments 
used to find violations of Bell inequalities have 
been ruled out”, albeit within statistical error 
limits. Tan and colleagues report a violation 
of up to 2.70, with a residual uncertainty that 
essentially rules out any classical descrip- 
tion of nature — their result is equivalent 
to being about 40 standard deviations away 
from the value obtainable using classical 
explanations. When preparation and readout 
errors in Ballance and co-workers’ study are 
accounted for, the theoretical maximum of 
the Bell inequality is 2.236; the authors report 
a violation of 2.228, with an uncertainty that 
means that the value differs by 15 standard 
deviations from any classical description. 

The results emphasize that science and engi- 
neering at the level of individual quanta can 
reveal and characterize quantum mechanics 


with unprecedented accuracy, at close to 100% 
detection efficiency. But they also impressively 
demonstrate how the total quantum perfor- 
mance of a system can be benchmarked — the 
proximity of the experimentally determined 
violations to their theoretical limits quanti- 
fies the quality, performance or fidelity of the 
quantum operations in a single number. 

The findings substantially improve the 
prospects for designing and realizing devices 
that use superposition states and entanglement 
as reliable resources, based on trapped ions or 
related systems. Different tasks in a common 
experimental protocol can now be allocated to 
the atomic species best suited for the chosen 
purpose — such as quantum memory, per- 
formance of logic operations with negligible 
effects on any nearby quantum memory ele- 
ments, and generating links to devices based 
on other technological platforms, such as 
photonic or solid-state devices. This paves the 


REPRODUCIBILITY 


way for precise spectroscopy, ultra-accurate 
clocks and simulators of quantum systems. It 
might even enable the development of univer- 
sal quantum computers capable of running a 
superposition of many correlated tasks in par- 
allel, offering much better performance than 
is currently available using conventional com- 
puters, such as exponentially higher speeds for 
dedicated applications. = 
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Experimental mismatch 
in neural circuits 


The finding that acute and chronic manipulations of the same neural circuit can 
produce different behavioural outcomes poses new questions about how best to 


analyse these circuits. SEE ARTICLE P.358 


THOMAS C. SUDHOF 


Nobel Prize in Physiology or Medicine for 
his work using acute electrical stimulation 
to study neural circuits. Modern neuroscience 
is dominated by a newer, more sophisticated 
technique for acute circuit manipulation: 
optogenetics, in which light-sensitive ion- 
channel proteins are engineered to activate or 
inhibit select neurons’. However, a nagging 
doubt pervades the field — do the behavioural 
effects of acutely activating or silencing spe- 
cific neurons reflect the normal functions of 
these cells? On page 358 of this issue, Otchy et 
al.” systematically address this question. Their 
findings are bound to excite lively discussion. 
If acute inactivation of a particular neural 
circuit alters an animal’s behaviour, the seem- 
ingly logical conclusion is that the circuit con- 
trols the behaviour. But the brain’s circuits are 
densely interconnected, so how can we be sure 
that these behavioural effects are not caused 
by changes to other, connected, circuits that 
normally do not participate in the targeted 
behaviour but are affected by the manipula- 
tion? Otchy et al. used a brilliant study design 
to test this idea. They reasoned that, if the 
effects of acute manipulation are directly 
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caused by the manipulated neurons, then 
chronically manipulating those neurons, for 
example by permanently impairing (lesioning) 
them, should have the same effect. The authors 
compared the effects of chronic and acute neu- 
ral manipulations in rats and in zebra finches. 
They examined behavioural tasks that were 
learnt before the manipulations, but that were 
not repeatedly practised afterwards, avoiding 
the confounding effect of relearning a task 
after an experimental manipulation. 

First, Otchy et al. demonstrated that, in rats 
that had learnt a complex lever-pressing task, 
acute silencing of neurons in the brain’s motor 
cortex using the drug muscimol profoundly 
impaired task performance. Acute optogenetic 
activation of motor-cortex neurons produced 
a similar effect. The same research group had 
shown previously’ that surgical ablation of the 
motor cortex blocked the initial learning of 
the lever-pressing task, but had no significant 
effect on the ability of rats to perform the task 
ifit had been learnt before surgery. Thus, acute 
and chronic manipulations produce discrepant 
results in this circuit (Fig. 1a). 

Ina second set of experiments, Otchy and 
colleagues used muscimol to inactivate song 
neurons in a brain region called the sensor- 
imotor nucleus interface (Nif) in zebra 


finches. This acute manipulation massively 
impaired birdsong, whereas chronic lesioning 
of Nif had no effect two days after the lesion 
(Fig. 1b). Investigating this apparent paradox, 
the authors showed that the Nif lesions did 
initially cause a change in the downstream 
neural circuitry controlling birdsong, but that 
this change spontaneously recovered without 
training after 3.4 hours. The researchers pro- 
pose that homeostatic plasticity, which adjusts 
the overall activity level of neurons ina circuit, 
might be involved in this recovery. However, 
other processes that change the strength of the 
synaptic connections between these neurons 
are equally likely to be responsible. 

How should we interpret these experiments? 
Two opposing hypotheses come to mind. 
First, that acute manipulations are unreliable 
and should be discarded in favour of chronic 
manipulations. Second, that acute manipu- 
lations elicit results that truly reflect normal 
circuit functions, and the lack of changes 
after chronic manipulations is caused by 
compensatory plasticity. 

Before choosing between these stark 
alternatives, several facts should be taken 
into account. Many chronic manipulations 
of neural circuits (both permanent genetic 
changes and physical lesions) do actually pro- 
duce major behavioural changes. For exam- 
ple, in rodent and human brains, lesions in 
the amygdala region impair fear memories’, 
and hippocampal lesions interfere with spa- 
tial memory’. Chronic deletion of the syn- 
aptic cell-adhesion molecule neuroligin-3 in 
striatal neurons alters learning of a repetitive 
motor task®. Thus, the finding that a chronic 
manipulation does not cause a behavioural 
change cannot simply be attributed to plastic- 
ity and compensation. 

Clearly, it is possible to dissect the 
functions of some types of neuron and circuit 
using chronic manipulations, making this a 
compelling overall experimental approach. 
But acute optogenetic manipulations are gen- 
erally easier to perform, and the conclusions 
drawn from many such manipulations do 
correlate well with those from chronic mani- 
pulations (see, for example, ref. 4). Moreover, 
such acute manipulations often match changes 
in neural activity observed during the targeted 
behaviour in vivo’, although a caveat of acute 
manipulations is that natural neural activity is 
normally limited to only a subset of neurons 
in a circuit, whereas acute manipulations are 
mostly not. 

There are multiple explanations for why acute 
and chronic manipulations might produce dis- 
tinct results, which makes it difficult, or perhaps 
even impossible, to assess whether results reflect 
‘off-target’ or ‘on-target’ effects, as Otchy et al. 
aptly call them. The authors point out that, 
because neural circuits are massively intercon- 
nected, acute manipulations are probably more 
susceptible to off-target effects than are chronic 
lesions. This is because acute manipulations are 
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Figure 1 | Mixed messages from neural manipulations. Otchy et al.’ compared the effects of acute and 
chronic manipulations of neural circuits on a specific behaviour. a, The authors taught rats to perform a 
complex lever-pressing task. Chronic inhibition of neurons in the brain’s motor cortex did not affect task 
performance, whereas acute perturbations strongly impaired performance. b, Likewise, chronic ablations 
of neurons in the sensorimotor nucleus interface (Nif) of the brains of zebra finches did not affect their 
songs, whereas songs became variable and unstructured after acute inhibition. 


more likely to spread to other connected circuits 
that have no normal role in the targeted behav- 
iour. Therefore, we cannot simply assume that 
the behavioural readouts of such manipula- 
tions always reflect the normal functions of the 
manipulated circuits. 

Where do we go from here? Most acute 
manipulation studies that use optogenet- 
ics confirm, and so add valuable support to, 
existing hypotheses that were established in 
earlier studies. But for those studies that have 
proposed new circuit functions, it may be 
advisable to re-evaluate the conclusions using 
independent approaches. 

In the future, it might be helpful always to 
correlate acute and chronic manipulations 
of specific neurons. If results from acute and 
chronic manipulations are discrepant, analyses 
of circuits that act in parallel to the manipu- 
lated circuit, or of similar neurons that are acti- 
vated by different stimuli, might be more likely 
to provide an explanation for the discrepancy 
than examination of chains of hierarchically 
connected neurons, because off-target effects 
probably propagate throughout neural cir- 
cuits by spilling over into adjacent, connected 
circuits. Moreover, studies of a broad range 
of behaviours might be helpful — restrict- 
ing a study to a few behaviours could make 
it harder to detect off-target effects. Overall, 
more caution about the conclusions drawn 
from circuit manipulations, be they acute 
or chronic, seems advisable, because most 
current studies focus on only one circuit and 
one behaviour. 


It is both an exciting and a sobering time for 
neuroscience. Exciting, because it is now pos- 
sible to manipulate neurons and circuits with 
an ease that was only dreamt of a few years 
ago. Sobering, because the massively parallel 
and interconnected nature of neural circuits 
is becoming apparent, and the complexity 
imposed on such circuits by various forms of 
plasticity has yet to be even touched on. By 
using parallel approaches to study circuits, we 
can develop an understanding of the brain that 
acknowledges the limitations of this under- 
standing, as well as its achievements. Such a 
strategy will drive the field forward. m 
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MICROBIOMES 


Curating communities 


from plants 


Large-scale cultivation and genome sequencing of the bacteria that inhabit 
the leaves and roots of Arabidopsis plants have paved the way for probing how 
microbial communities assemble and function. SEE ARTICLE P.364 


GWYN A. BEATTIE 


ast networks of microorganisms live 

in our soils, seas and bodies. These 

microbiomes also develop in inti- 
mate association with plants, in which they 
can enhance nutrient uptake, growth and tol- 
erance to pathogens, pests and environmen- 
tal stresses. Recognition of the fundamental 
role of microbes in the health of plants and 
animals, and the centrality of microbes in 
many ecological processes, has led to recent 
proposals for international' and US-based’ 
microbiome initiatives. These proposals have 
highlighted a key need to develop collec- 
tions of cultured organisms for experimen- 
tal enquiry into the function and assembly 
of native communities’. On page 364 of this 
issue, Bai et al.* describe genome-sequenced 
bacterial culture collections that represent 
most of the species in native root- and leaf- 
associated microbiomes of Arabidopsis thali- 
ana plants. They show that these collections 
can be used to reproducibly establish commu- 
nities that resemble those found naturally on 
wild plants. 

High-throughput genomic sequencing is 
enabling the characterization of microbiome 
profiles based on nucleic-acid signatures 
and the total gene content of a community 
(metagenomics). The breadth and depth of this 
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Figure 1 | From correlation to causation. Cultivation-independent 
profiling of microbial communities involves sequencing DNA fragments 
amplified from cells to generate a comprehensive picture of the community 
members. These profiles can be used to identify correlations, such as the 
presence of specific microbes on leaves versus roots, and to evaluate the extent 
to which culture collections represent the complete community. Bai et al.’ 
generated large collections of bacteria associated with the leaves and roots of 
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profiling is increasing with the affordability of 
sequencing. Because this approach does not 
require the microorganisms to be cultivated, 
it has transformed our understanding of the 
taxonomic composition and gene content of 
animal- and plant-associated microbiomes. 
For example, cultivation-independent pro- 
filing of the root microbiota of the model 
flowering plant Arabidopsis has highlighted 
compositional consistencies not only across 
soils from multiple continents*”, suggesting 
that these microbiomes share common assem- 
bly processes, but also across multiple Arabi- 
dopsis lineages’, suggesting their evolutionary 
conservation. 

However, uncovering the microbiome 
assembly mechanisms requires the ability to 
manipulate microbial communities, including 
engineering and perturbing synthetic commu- 
nities. Thus, experimental enquiry into micro- 
biomes requires more than sequence data — it 
needs microbial cultures (Fig. 1). 

Bai and colleagues amassed and identified 
almost 8,000 bacterial isolates from the roots 
and leaves of Arabidopsis plants grown in 
the field or in the laboratory in soils taken 
from the field. These collections included 
representatives of most bacterial species 
that have been identified in Arabidopsis 
microbiomes by cultivation-independent 
profiling**, which suggests that most bacteria 
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communities. 
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associated with Arabidopsis leaves and roots 
are readily cultivated. This culturability of 
plant-associated bacteria contrasts sharply 
with the historic inability to culture the vast 
majority of bacteria in soil and aquatic habi- 
tats’, and it probably results from root and leaf 
habitats being rich in organic compounds and 
oxygen. The finding that these communities 
can be so well represented by culture collec- 
tions highlights the value of plant microbes as 
models for investigating the mechanisms of 
microbiome assembly and function. 

Bacteria associated with the roots and 
leaves of terrestrial plants generally fall into 
only a few phyla that are shared between 
these plant tissues**. By generating taxo- 
nomically representative culture collections 
of microbes from roots (194 isolates) and 
leaves (206 isolates), Bai et al. established 
that bacterial families in these phyla are 
generally found on both tissue types. How- 
ever, the function of microbiomes, par- 
ticularly with regard to their impact on the 
host plant, is probably strongly rooted at 
the species, subspecies and strain level, and 
information at these levels is captured by 
sequencing whole genomes. 

The authors generated high-quality draft 
genome sequences of their 400 root and leaf 
isolates, as well as 32 soil isolates, and exam- 
ined how the phylogenetic and functional 
diversity among isolates within microbial 
families correlates with their origins in roots 
or leaves. They found some evidence for 
microbial specialization to either the leaf or 
root niche: a few phylogenetic clusters were 
found only or primarily in one niche, and 
certain functional characteristics — such 
as the degradation of foreign chemical 
substances — were enriched in one niche more 
than the other. However, the taxonomy of the 
isolates predicted their functional diversity 
much better than did their origins on roots 
or leaves. The authors’ recognition of promi- 
nent family-level differences in functional 
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Arabidopsis thaliana plants and found them to be highly representative 

of the species present in cultivation-independent profiles. The authors used 
these culture collections to derive draft genomes, evaluate potential 
functional activities and create synthetic communities that, when 

applied to initially microbe-free plants, allowed experimental evaluation 

of factors that drive the assembly of leaf- and root-associated microbial 


diversification demonstrates a need for studies 
into how distinct taxonomic groups contribute 
to microbiome function. 

Synthetic microbial communities can be 
used to systematically query natural micro- 
biome processes. Bai et al. introduced synthetic 
communities of 188 and 218 representative 
isolates from root (or soil) and leaf communi- 
ties, respectively, onto gnotobiotic Arabidopsis 
plants — plants that were microorganism-free 
before inoculation with known microorgan- 
isms. They then evaluated the communities 
that assembled by sequencing genes that help 
to identify the taxa (the 16S ribosomal RNA 
genes). These synthetic communities yielded 
assemblages on gnotobiotic plants that had 
consistent compositions, showing reproduc- 
ibility in microbiome assembly processes; 
moreover, their composition resembled the 
native bacterial microbiomes found on wild 
Arabidopsis plants. Surprisingly, the result- 
ing communities were not influenced by the 
relative proportion of the applied strains, 
indicating that community assembly is a 
robust process. 

The synthetic communities were also 
instrumental in teasing apart two of the driv- 
ers of community assembly on Arabidopsis 
leaves: the source of the isolates (roots or 
leaves), and their arrival through the air or 
the soil. These findings demonstrate how syn- 
thetic communities can serve as windows on 
the origins and development of the bacterial 
component of plant microbiomes. 

Weare ata crucial juncture in microbiome 
research, transitioning from cataloguing 
microbes and genes to executing hypothe- 
sis-driven experiments. Bai and colleagues 
have provided resources that will speed this 
transition for plant research, including a 
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large culture collection, complex synthetic 
communities with sequenced genomes anda 
gnotobiotic reconstitution system. Together, 
these resources enable recapitulation of the 
assembly of native bacterial communities 
on Arabidopsis plants, facilitating studies 
that provide ecologically relevant answers to 
questions about the establishment, dynam- 
ics, resilience, function and evolution of plant 
microbiomes. The mechanistic understand- 
ing derived from these synthetic communities 
is an excellent step on the road to understand- 
ing how the sustained health and productiv- 
ity of our agricultural and natural systems are 
influenced by plant microbiomes and, more 
broadly, by phytobiomes — the networks of 
bacteria, fungi, oomycetes, viruses, nema- 
todes, insects and other animals that affect 
plants. = 
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A history of Greenland’s 


ice loss 


Aerial photographs, remote-sensing observations and geological evidence 
together provide a reconstruction of mass loss from the Greenland Ice Sheet since 
1900 — a great resource for climate scientists. SEE LETTER P.396 


BEATA M. CSATHO 


oss of ice-sheet mass is a major 
Leu to current sea-level rise, and 

is expected to continue as global warm- 
ing proceeds’. Detailed reconstructions of 
changes in the Greenland and Antarctic ice 
sheets over the past few decades are available, 
based on remotely sensed data. But extend- 
ing this record further into the past poses a 


big problem because of the lack of systematic 
monitoring of changes in ice-sheet elevations. 
On page 396 of this issue, Kjeldsen et al.” pre- 
sent the first observation-based estimate of 
mass loss from the Greenland Ice Sheet from 
the end of the nineteenth century, when it 
began to retreat from its maximum extent 
achieved during the Little Ice Age (LIA), to 
the present day. Their findings show how 
the reconstruction of past ice-sheet changes 
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Figure 1 | The Upernavik Ice Stream in northwest Greenland. This glacier is one of many that drain 
the Greenland Ice Sheet into the sea. A trimline — distinguished by differently coloured rock above and 
below the line — is visible in the nearby hill, and indicates the maximum extent of ice during the Little Ice 
Age (LIA). Kjeldsen et al.’ report a reconstruction of mass loss from the Greenland Ice Sheet since 1900, 
the end of the LIA. 


helps to account for sources of sea-level rise 
and improves our understanding of the major 
processes controlling ice-sheet mass loss. 

Historical photographs of ice sheets provide 
long-term context for mass loss by enabling 
measurements of their surface elevations and 
extent before satellites were used for remote 
sensing. Moreover, they facilitate the accurate 
mapping of glacial geomorphic features such 
as vegetation trimlines (Fig. 1) and moraines, 
which mark the highest extent of the ice sheet 
during the LIA in the case of the Greenland Ice 
Sheet. A treasure trove of aerial photographs of 
Greenland has been extensively used for many 
years’, but because of the difficulty in obtain- 
ing accurate surface-elevation measurements 
from historical photographs, a detailed time- 
line of mass loss was reconstructed only for the 
largest glaciers*”. 

To estimate the mass-balance history of 
the Greenland Ice Sheet — the time course 
of differences between mass gained by snow 
accumulation and that lost by melting and 
calving of icebergs — since the LIA, Kjeldsen 
et al. began by reprocessing images taken by 
the comprehensive Greenland aerial pho- 
tography survey during 1978-87. They used 
modern photogrammetric methods to derive 
high-resolution, accurate digital elevation 
models (DEMs) depicting the ice-sheet sur- 
face at the sheet’s margins during the survey 
period. They also reconstructed the ice-sheet 
margins during the LIA in three dimensions 
by mapping vegetation trimlines and glacial 
moraines. Taken together with laser-altimetry 
measurements from 2003 to 2010, these 
analyses enabled the authors to determine 
elevation changes for three different epochs 
since the 1900s. 

The results show that the Greenland Ice 


Sheet contributed substantially to sea-level rise 
throughout the twentieth century, providing 
at least 25+9.4 millimetres of the total global 
mean rise. Furthermore, rates of mass loss 
during 2003-10 were twice those during the 
twentieth century, mostly because of increas- 
ing water runoff from the surface, whereas 
discharge through iceberg calving has 
remained essentially the same since the LIA. 

Kjeldsen and colleagues also report a large 
spatial variation in ice-sheet changes, indicat- 
ing that the sheet’s response to climate forcing 
is modulated by local geometric factors such 
as the topography of the underlying bed and 
the sizes of the drainage basins of individual 
glaciers. The striking similarity between the 
elevation-change patterns during the differ- 
ent epochs suggests that local controls act 
similarly on both decadal and centennial 
timescales. 

The authors’ discovery of a large mass loss, 
which averaged 75 gigatonnes per year (equiv- 
alent to a sea-level rise of 0.21 mm yr’) dur- 
ing the twentieth century, emphasizes the need 
for improvements to the record of ice-sheet 
changes before the start of detailed remote- 
sensing measurements in the 1990s. Existing 
long-term records are usually based on time 
series of the positions of ice-sheet margins, but 
such records can be misleading for glaciers that 
flow into the ocean, whose floating termini 
can advance or retreat without any substan- 
tial changes farther up-glacier. Furthermore, 
only repeated elevation measurements allow 
the quantification of mass loss that is neces- 
sary to estimate contributions to sea-level 
rise. Kjeldsen and co-workers’ results provide 
an excellent framework for selecting regions 
that represent different long-term mass-loss 
patterns for further detailed studies. 
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A crucial objective of those studies should 
be to examine the stability of the Green- 
land Ice Sheet between 1960 and 1990. It 
has been assumed that this ice sheet was in 
equilibrium during this period, and so cal- 
culated changes in its surface mass-balance 
relative to the average during 1960-90 are 
used to work out whether recent ice-sheet 
surface losses are anomalous’. Kjeldsen et al. 
challenge this assumption by arguing that it 
contradicts the long-term persistent mass loss 
detected in their study. However, the tempo- 
ral sampling of their study is not sufficiently 
detailed to rule out the possibility that a 
near-steady-state condition existed following 
the warm period that occurred in the 1930s 
and 1940s. 

The rich archive of historical stereo aerial 
photographs of Greenland includes: systematic 
surveys taken during the 1930s that were used 
to generate 1:250,000 scale topographic maps; 
oblique aerial photographs taken by the US 
Air Force for reconnaissance during the Sec- 
ond World War using a Trimetrogon camera 
(which also enable topographic information 
to be determined); and repeat surveys of the 
catchment basins of all major outlet glaciers 
around the Jakobshavn Isbree during 1957-58 
and in 1964, taken as part of the International 
Glaciological Expeditions to Greenland. More- 
over, high-resolution stereo images collected 
by US intelligence satellites are available from 
the 1960s and 1970s. Ifall of these were com- 
bined with more-recent satellite observations, 
then a comprehensive record of long-term sur- 
face elevation, positions of calving fronts and 
ice margins, and ice-velocity changes could 
be obtained. This could be used to assess the 
implications of recent changes in the context of 
climate change and to provide input for model- 
ling studies. 

In the meantime, the authors’ reconstruction 
will help to improve numerical models — by 
providing a time series of changes at ice sheet 
margins for the whole Greenland Ice Sheet 
during the twentieth century, suitable for vali- 
dating models. Although the extensive spatial 
overlap of laser altimetry and DEMs derived 
from stereo photogrammetry along the ice- 
sheet margins provides robust and accurate 
change detection in these regions between 
the 1980s and the present, further research — 
particularly the use of more-realistic ice-sheet 
models — is needed to derive accurate eleva- 
tions within the interior of the ice sheet before 
the start of laser-altimetry observations in the 
1990s. Improving the accuracy of past elevation 
reconstructions would result in better estimates 
of long-term mass-balance changes. 

Finally, once the timing of equilibrium con- 
ditions for the Greenland Ice Sheet is verified, 
a detailed reconstruction for that period could 
serve as a steady-state ice-sheet surface for ini- 
tializing ice-sheet models. Establishing such a 
steady-state surface is a prerequisite for deriv- 
ing projections of future ice-sheet evolution 
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that are more credible than currently available 
projections. m 
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Twenty-five years of the 
sex-determining gene 


The discovery that the gene SRY on the mammalian Y chromosome drives testis 
development marked a turning point in the decades-long quest to understand 
the genetic underpinnings and evolution of sex determination. 


JENNIFER A. MARSHALL GRAVES 


determining factor (TDF) on the Y chro- 

mosome kick-starts testis development in 
humans and other mammals. The testes make 
hormones, and these hormones make the 
embryo male. Twenty-five years ago, Sinclair et 
al.’ reported in Nature that TDF was the gene 
SRY. This discovery opened up the surpris- 
ingly intricate genetic pathway that determines 
whether a baby is born a boy or a girl. It also 
led to an understanding of how genes on the 
Y chromosome evolved, and of the impact of 
this key evolutionary event. 

Until the 1980s, there was no viable can- 
didate sex-determining gene. Just where was 
TDF located? What kind of product did it 
encode? What did it do? During the 1980s, 
the position of TDF was narrowed down toa 
small region on the short arm of the Y chro- 
mosome, when it was found that some males 
had XX chromosomes that harboured a small 
piece of the Y, whereas some females had XY 
chromosomes that lacked bits of the Y — these 
added and deleted regions of Y were assumed 
to contain the TDF sequence. The race was 
then on to find TDF. 

In 1987, the geneticist David Page and his 
associates” identified the first coding gene on 
the human Y, called ZFY. The gene looked 
like a winning candidate: it was in the right 
place; it was expressed in the testis; and it was 
evolutionarily conserved in other placental 
mammals, such as monkeys, mice, dogs and 
horses. But in 1988, PhD students in my lab- 
oratory’ , Andrew Sinclair and Jamie Foster, 
mapped ZFY to a non-sex chromosome (an 
autosome) in marsupials, which are a separate 
branch of mammals. A few months later, it 


I: has long been known that a testis- 


was found’ that, although ZFY is expressed 
in mouse sperm precursors, it is absent from 
the other cells of the testis, where a true TDF 
must be expressed to exert a sex-determining 
effect. 

Sinclair joined a renewed hunt for human 
TDF in the laboratory of geneticist Peter 
Goodfellow, using DNA from XY males that 
had even smaller pieces of the Y than had 
previously been studied. This was slow and 
frustrating work, because the Y chromosome 
is full of repetitive sequences and so specific 
regions are hard to pinpoint. It was 1990 before 
they found’ a small coding gene close to the 
end of the Y chromosome (Fig. 1). Noncom- 
mittally they called the gene SRY, for sex region 
on the Y. The final proof that SRY was the TDF 
came from the discovery of SRY mutations in 
XY females’ and from the demonstration that 
adding Sry to XX mice was sufficient to induce 
male development’. SRY was located on the Y 
in other placental mammals and, thankfully, 
even in marsupials’. 

Researchers in the field imagined that 
identifying TDF would rapidly lead to an 
understanding of how it worked, and would 
point to other genes in the sex-determin- 
ing pathway. But 25 years on, it has become 
clear that the pathway kick-started by SRY is 
complex, full of checks and balances. 

Initially, SRY proved a puzzle because it was 
unlike any known gene. It turned out to be a 
member of a previously unidentified family, 
now called the SOX genes. Painstaking bio- 
chemical studies of the SRY protein revealed 
that it bound to a certain DNA sequence and 
bent it at an angle, presumably to bring other 
sequences — or the proteins bound to them — 
into proximity, promoting or inhibiting 
transcription®. The discovery of a different 
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50 Years Ago 


The Royal Society Anniversary 
Address by Lord Florey, O.M., PRS. 
Perhaps the deployment of 
Government resources is the 
modern equivalent of events in 

the early days of the Society when 
Fellows contributed—or sometimes 
did not contribute—a shilling 

a week towards demonstrating 
experiments at meetings. There 
never was enough money... At 

the moment it is considered to be 
desirable to give free medicine 

to all. The application of free 
calamine lotion to the irritated 
skins of the populace may be more 
important than administering to 
the needs of irritated scientists; 

but this sort of judgement is in 

the realm of politics... it has long 
been the policy of the Society to 
have symposia and lectures... 

the popularity of such gatherings 
has brought difficulties... on 

one occasion, we had to migrate 

to the lecture theatre of the Shell 
Building on the South Bank... one 
consequence of this peripatetic 
existence has been that we have 
had to procure a coffin-like box for 
the transport of the mace, and Iam 
sure that our original Fellows, and 
even Charles II himself, might have 
been somewhat astonished at the 
adventures of their royal emblem. 
From Nature 18 December 1965 


100 Years Ago 


The Romanes Lecture ... was 

a scathing indictment of the 
ineptitude of the lawyer-politicians 
who possess a dominating influence 
on national affairs ... To the neglect 
of science, and the excessive 
predominance in Parliament and the 
Government of men with the spirit 
of the advocate to whom all evidence 
which will not support their case is 
unwelcome, Prof. Poulton ascribes 
the chief mistakes in the conduct of 
the war. 

From Nature 22 November 1915 
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a_ The X and Y chromosomes 


TDF region 


Figure 1 | An evolving understanding of sex. a, In humans, sex is based 
on the presence or absence of the Y chromosome, seen here with its 

larger partner, X. The testis-determining factor (TDF) that drives male 
development was known to lie on the short arm of Y, but its identity was a 
mystery. b, In 1990, Sinclair et al.' found two males with only a small piece 
of Y, which had been broken and fused to the X. They scoured the 35,000 
base pairs between the break points and the region at the tip of the Y that is 
shared with the X, finding several regions (black) that were specific to the Y. 
One of these regions contained the TDF gene, SRY. c, This discovery led to 


SOX gene that was disrupted in XY female 
babies with a severe bone deformity”"’ revealed 
that this gene, SOX9, is the binding target of 
SRY protein. SOX9 is now known to be a mas- 
ter regulator of sex determination throughout 
the vertebrates. 

Studying the mutations that cause sex 
reversal in humans, mice, goats or dogs (the 
same pathway is active in all mammals) has 
proved a successful strategy for identifying 
many genes in the sex-determination path- 
way. Gradually, a network of genes that are 
regulated by, or regulate, SRY or SOX9 has 
been constructed, and their function tested by 
mutating the genes in mice’. Some genes pro- 
mote testis formation, some maintain it, and 
yet others oppose them. This pathway and its 
control is still being explored. Our improved 
understanding has helped us both to answer 
fundamental scientific questions and to diag- 
nose and treat many babies who are born with 
disorders of sex determination”. 

The other major line of research enabled 
by the identification of SRY was the evolu- 
tion of sex genes and chromosomes. The hunt 
for SRY in marsupials revealed that mammals 
have an SRY-related gene on the X chromo- 
some, SOX3, which was proposed to be the 
ancestor of SRY”’. This idea is supported by 
human and mouse data™ that showed that 
misexpression of SOX3 in the undifferenti- 
ated gonad (a tissue can develop into either 
an ovary ora testis, depending on the signals 
it receives) drives male development in XX 
embryos. SRY probably evolved from SOX3 
when its 5’ region was replaced by a pro- 
moter sequence that drove expression in the 
gonad (Fig. 1). 


b Discovering SRY 


i [if if of if if 
| a f 
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© XY evolution 
Autosome pair 
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SSS ed | 
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_— a 
Promoter for sperm 


precursors and central 
nervous system 


Although it might seem counterintuitive 
that the testis- determining factor evolved from 
the X chromosome, it has since emerged’® 
that 20 of the 27 genes on the male-specific 
part of the human Y evolved from genes on 
the X. Thus, the Y is basically a degraded X 
chromosome. This supports the hypothesis 
that sex chromosomes originate when one 
member of an autosome pair acquires a sex- 
determining gene. Nearby genes then also 
acquire a sex-specific function, crossing over 
between the chromosome pair is suppressed to 
keep the male-specific gene package together, 
and the genetically isolated region on the sex- 


specific chromosome 
“It has become Se sere 
clear that the ome eee 
pathway kick- aes eee 

probably defined by 
started by en the evolution of SRY. 
is complex, fu Vertebrate phylog- 
of checks and eny puts the age of 
balances. 


SRY and the XY pair 
at between 166 mil- 
lion and 190 million years old. Furthermore, 
rapid speciation in other lineages that have 
undergone sex-chromosome turnover raises 
the possibility that acquisition of SRY might 
have driven the divergence of the egg-laying 
monotreme mammals from the rest of the 
mammalian lineage — monotremes have a 
bizarre, complex sex-determination system 
that is related to bird sex chromosomes”. 
The future of the Y chromosome is now 
hotly debated. Evidence suggests that the 
mammalian Y will disappear in just a few mil- 
lion years if gene loss continues at the same 
rate as in the past’®. It has already disappeared 
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Promoter for gonad 


an understanding of how X and Y evolved. The gene SOX3 was located on 
a pair of non-sex chromosomes (autosomes) in the ancestors of mammals. 
A promoter sequence drove expression of SOX3 in sperm precursors 

and the central nervous system. The promoter on one copy of SOX3 was 
replaced with a sequence that drives expression in the undifferentiated 
gonad (a tissue that can develop into either an ovary or a testis). This 
expression pattern allowed the new gene, SRY, to direct testis development. 
Over time, genes not needed for male development were degraded on this 
chromosome, giving rise to the Y. (Part b adapted from ref. 1.) 


in two groups of rodents, and SRY has been 
replaced by another gene from the sex- 
determining network”. The primate Y seems 
more stable”’, but will eventually erode away. 
Humans may be in for another round of sex- 
chromosome turnover — and maybe specia- 
tion — ifand when SRY finally bows out. m 
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Rarity in mass extinctions and the future 


of ecosystems 


Pincelli M. Hull!, Simon A. F. Darroch?? & Douglas H. Erwin? 


The fossil record provides striking case studies of biodiversity loss and global ecosystem upheaval. Because of this, 
many studies have sought to assess the magnitude of the current biodiversity crisis relative to past crises—a task greatly 
complicated by the need to extrapolate extinction rates. Here we challenge this approach by showing that the rarity of 
previously abundant taxa may be more important than extinction in the cascade of events leading to global changes in 
the biosphere. Mass rarity may provide the most robust measure of our current biodiversity crisis relative to those past, 


and new insights into the dynamics of mass extinction. 


t has become commonplace to refer to the modern biodiversity crisis 
as the ‘sixth mass extinction’. With three short words, we place the 

biotic and environmental disturbance created by mankind on par 
with the greatest biodiversity crises of the past half billion years. This is a 
comparison that demands close attention as the ‘Big Five’ mass extinctions 
include truly catastrophic events*, the biggest of which resulted in the 
inferred extinction of >75% of species alive at the time!*, In addition, 
mass extinctions have shaped the evolutionary history of the planet?”’. 
Organisms that were ecologically dominant before a mass extinction fre- 
quently do not survive, and rarely enjoy the same levels of dominance in 
the aftermath®*. However, there are fundamental differences between the 
types of data upon which past mass extinctions have been identified, and 
those upon which the current biodiversity crisis is being assessed. That 
is, abundant marine fossil genera on multi-million year timescales for the 
former™!®, and (often rare) terrestrial species on decadal to centennial 
timescales for the latter'. So the question is critical: are we currently in 
the midst of the ‘sixth’ mass extinction, and can we develop an appropriate 
metric for the comparison of ancient and modern biotic crises? 

The Big Five mass extinctions were profoundly disruptive events with 
effects extending far beyond the loss of taxonomic diversity!!"!°. In addi- 
tion to extinction, all major mass extinctions are also characterized by 
prolonged intervals of ecological change’*'®. Ecosystems are comprised 
of interacting networks of biotic and biophysical components, including 
taxa, nutrients, and their trophic and non-trophic interactions’’. Species 
loss and ecosystem reassembly during mass extinction is unsurprising 
given the disruption of ecological networks!*. For hundreds of thou- 
sands to millions of years after mass extinctions, a series of short-lived, 
low-diversity and (at times) low productivity ecosystems dominate'®!?”°, 
Large-bodied taxa often become dwarfed, or are replaced by small-bodied 
taxa”!”?, Previously dominant groups may be supplanted in the evolution- 
ary diversifications that follow???, as new, diverse ecosystems are built?®. 
The largest extinction intervals result in permanent state changes in the 
structure of ecosystems, as well as the character of the flora and fauna that 
dominate them>*>”7, Mass extinctions, therefore, not only punctuate the 
history of life, they also forever alter its trajectory. 

In this light, the fossil record of mass extinctions is an important lab- 
oratory for understanding the effects of current environmental change 
on global ecosystem structure and function”®. A key question is: how do 
minor biodiversity crises become mass extinctions? And, why do mass 
extinctions tend to coincide with permanent state changes in global 
ecosystems? To date, studies have considered these issues by comparing 


projected rates of modern species loss and rates estimated from the fossil 
record!!!°__a method complicated by the need to extrapolate across 
temporal scales and abrupt state changes. Here, we propose a different 
approach, and consider whether the loss of species abundance—mass 
rarity—might have characterized past mass extinctions as they were 
occurring. Rarity is important for two reasons: first, because it more 
accurately reflects function in ecological networks*? and thus mass 
rarity (rather than mass extinction) may be a primary driver of the events 
and patterns associated with the mass disappearance of fossils from the 
fossil record. Second, the extent to which previously common taxa have 
become rare offers a direct metric of the size of the present biotic crisis. 
There may be no need to project current extinction rates in order to get 
a sense of the future of ecosystems. Mass rarity may be all that is needed 
to forever change the biosphere. 


From past abundance to current rarity 

Humans have reduced the abundance of many historically common spe- 
cies. This increased rarity has been achieved through wholesale reduction 
in geographic ranges and/or population sizes, through modification of 
terrestrial habitats, appropriation of primary productivity for humanity, 
overexploitation and pollution, among other factors*!~3, On land, wide- 
spread evidence exists for ongoing habitat loss and population declines 
globally* 134 This includes, for instance, a 20% decline in habitat specialist 
populations monitored by the Wild Bird Index since the 1980s, and con- 
tinuing declines in the IUCN Red List Index of species survival aggregated 
across birds, mammals, amphibians and corals*!. Likewise, most fished 
coral reefs support less than half the expected fish biomass*, with long- 
term declines in the abundance of reef taxa since first human contact”. 
Among subsets of mammals, birds, butterflies, and highly mobile pelagic 
predators, more than 50% of the taxa studied have experienced range 
contractions in the last decades to centuries*”~*’. Yet to date, the absolute 
number of recorded species extinctions is dwarfed by those inferred for 
mass extinctions in the geological past’! and local declines in species 
richness are equivocal*?”°, However, the extent of abundance loss is not 
equivocal, nor is the effect of land use*+, Mass rarity, that is the reduction in 
geographic range and/or numerical abundance of a species globally, seems 
to be one or more orders of magnitude more severe than extinctions to 
date*!*, and is an urgent conservation priority for both species and eco- 
systems*4>-47, What remains a major unknown, however, is how global 
mass rarity today relates to the biotic crisis recorded in the fossil record, 
and what sustained mass rarity might mean for the future of ecosystems. 


1Department of Geology and Geophysics, Yale University, New Haven, Connecticut 06520-8109, USA. Department of Paleobiology, National Museum of Natural History, Washington, 
DC 20013-7012, USA. Department of Earth and Environmental Sciences, Vanderbilt University, Nashville, Tennessee 37235-1805, USA. 
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Figure 1 | Mass rarity and mass extinction are indistinguishable in the 
fossil record, and may have the same ecosystem effects. Anthropogenic 
activities have led to mass rarity of many previously abundant flora and fauna 
(right to middle). Mass rarity can look like mass extinction in the fossil record 
because the previously abundant taxa become so rare as to no longer be 
readily observed (bottom). Previously abundant and ecologically important 


We suggest that global rarity today (that is recent mass rarity, not the 
local rarity of most species in ecological studies as in ref. 48) may already 
be equivalent to intervals of pervasive fossil disappearance (Fig. 1). This 
is because the fossil record, particularly as it is preserved and studied 
across extinction boundaries (Box 1), primarily records the dynamics of 
durably skeletonized, geographically widespread, abundant taxa, and not 
the absolute presence or absence of all species originally in that ecosystem. 
When taxa are rare they can be missed, and when events are rapid, the 
order and importance of different factors can be hard to interpret. 

The vast majority of species evolve, exist and become extinct without 
being preserved as fossils*?->!. The fossil record is instead dominated 
by species that inhabit environments with high preservation potential. 
Such environments include those in which sediment accumulates, such 
as in (or around) lakes, rivers, swamps, marine basins, or reef tracts*”. 
Even in such areas, most species stand little chance of being preserved. 
Rather, the fossil record is dominated by those taxa possessing heavily 
mineralized hard parts, such as teeth, bone or shells*!. Organisms that 
are very small, entirely soft-bodied, or occur in ephemeral habitats are 
rarely preserved*?-°!. Additionally, as in living ecosystems, species that 
exist over a broad geographic range and in large numbers have a higher 
probability of being found than species that are rare and/or geograph- 
ically restricted. 

As a consequence, the fossil record of abundant, widespread, hard-bod- 
ied, marine taxa shapes our paleontological perspective of the long-term 
dynamics of life! (see Box 1). By definition, a mass extinction is an 
interval of time characterized by elevated rates of extinction relative to 
background intervals!*!°. In practice, however, they are identified by 
the geologically sudden disappearance of abundant, long-lived genera 
(or higher order taxa) from global-scale compilations of fossil occurrences 
of biomineralizing taxa”. 

The often-discussed ‘Big Five’ mass extinction events were first recog- 
nized in this way from the shelly marine fossil record: the end Ordovician 
(~445 million years ago (Ma)), end Devonian (~375 Ma), Permo-Triassic 
(PT; 251 Ma), Triassic—Jurassic (TJ; 199 Ma), and Cretaceous—Palaeogene 
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groups, such as ecosystem engineers may not actually become extinct, but 
decline below the abundance threshold required for them to perform their 
ecological roles, becoming ecological ‘ghosts. Chance reassembly after mass 
rarity could lead to drastically different ecosystem structure and function 
even with minimal extinction (right)—raising the question of what the future 
might hold. Artwork courtesy of Nicolle R. Fuller, Sayo-Art. 


(KPg; 66 Ma)!®}9, although marine and terrestrial extinctions have 
subsequently been shown to often go hand-in-hand”. 

Detecting and predicting the ultimate severity of a mass extinction 
as it is happening requires a detailed understanding of the triggers and 
feedbacks of the extinction interval—the geologically brief interval of time 
when previously abundant fossil taxa disappear en masse (see Extinction 
in Fig. 2). Assessments of the severity of the current biodiversity crisis 
relative to those of the past presuppose an understanding of these geologi- 
cally near-instantaneous events (Box 1). So, how much is actually known? 


Changing the world 

Extinction intervals involve a primary trigger, secondary feedbacks, 
ecological transitions, and extinction (Fig. 2)!*. The primary trigger 
(or set of triggers) is the environmental disturbance(s) that precipitates 
the mass extinction—including, for instance, asteroid impact or massive 
volcanism. A primary trigger need not drive many species extinct, as per 
the classic view of mass extinctions (Fig. 3a, scenario 1). Rather, it need 
only cause sufficient disturbance for processes like extinction debt***> 
or ecological collapse! to result in mass secondary extinctions (Fig. 3b, 
scenario 2). A primary trigger might produce widespread rarity of for- 
merly dominant taxa, thereby greatly elevating rates of background extinc- 
tion for these taxa (Fig. 3c, scenario 3), or could directly cause the extinction 
of all species lost in a given interval. In addition, ecological turnover may 
precede the loss of taxa (that is, be driven by the primary trigger) or follow 
it (that is, result from the loss of species during extinction). 

The brevity of mass extinctions (Box 1), combined with the time- 
averaged nature of the fossil record, currently precludes an understanding 
of the relative contribution of these four processes (Fig. 3). This makes it 
very difficult to use fossil data to disentangle alternative scenarios of the 
dynamics of mass extinctions: ‘trigger kills all’ (Fig. 3a), ‘trigger sparks 
feedbacks and secondary extinctions’ (Fig. 3b), and ‘trigger drives mass 
rarity and elevated extinction risk (Fig. 3c). We have little information 
yet about the relative importance of primary and secondary extinctions 
or mass rarity during past events. 
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BOX | 
The scale of extinction dynamics 


Extinction intervals are extremely short (Fig. 2), even geologically 
instantaneous, relative to the typical resolving power of the fossil 
record!!2 (see Box Figure). The three mass extinction events 
with the best geochronologic constraints on their duration 

(PT, TJ and KPg) all occurred on time scales on the order of 
103-104 years!81!3-115. |n exceptional circumstances, rapid 
sedimentation may preserve a temporally detailed record of 

a mass extinction in a local region!!4. However, taphonomic 

and sedimentological processes typically time-average 
accumulations of shell material such that individual samples 
will represent communities mixed over 103-104 year intervals. 
We consider events ‘geologically instantaneous’ if they occur 

on timescales at or below the resolution of the records used to 
study them (here 10-107 years). While exceptional ‘snapshots’ 
of the fossil seafloor during a single moment of time do exist 
(that is, Konservat Lagerstatten), they are so infrequent that 
they rarely figure in studies of mass extinctions, and none have 
yet been discovered crossing a major extinction boundary. The 
global paleontological and marine core compilations that are so 
key for detailing the broader patterns of extinction, currently lack 
the temporal resolution needed to disentangle the dynamics 
within the extinction interval. The unavoidable conclusion is that 
the ‘pixel size’ of the fossil record may be too temporally coarse, 
or spatially restricted, to resolve the most important processes 
during the extinction phase. 
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Box Figure Mismatch in the spatio-temporal scale of 
ecosystems collapse and the resolving power of the fossil 
record. The fossil record provides detailed records of 
macroevolutionary processes occurring at many spatial and 
temporal scales (shaded regions). The dynamics of extinction 
intervals occur on spatial and temporal scales just beyond those 
that are readily documented (striped box). 


To be clear, these three scenarios are distinguished by the internal 
dynamics of the extinction interval (Figs 1 and 3). In scenario 1, the 
extinction of well-fossilized taxa is driven by the trigger and coincides 
with, or even precedes, major environmental change. In scenarios 2 and 
3, mass extinction is delayed—being driven by secondary feedbacks 
or elevated background extinction risk, respectively—after profound 
ecological disruption. 

Comparing the present crisis to those that have occurred in the past 
requires knowing which of these scenarios is typical or dominant, as each 
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involves distinct patterns of feedback, propagation of risk, and timing of 
extinction. To date, palaeontologists have acted on the implicit assump- 
tion that the first scenario is correct (with rare exceptions, as in refs 18, 
56, 57), when all the fossil record indicates—at a minimum—is that 
there must have been a geological instantaneous loss in the abundance 
of previously dominant taxa at the extinction boundary (the third sce- 
nario). The relative importance of these scenarios during the extinction 
interval cannot be disentangled by standard quantitative paleontological 
approaches, like those used to estimate species ranges or to control for 
uneven sampling in diversity dynamics”’, because the timescale of the 
extinction interval is much shorter than the uncertainty intervals associ- 
ated with these approaches. 

That said, the dynamics of modern ecosystems support the inference 
that mass rarity can drive permanent ecosystem change. Taxa need 
not go locally or globally extinct to destroy the links in an ecological 
network. Rather, species simply have to become so rare as to be eco- 
logically insignificant*™™. For instance, in the Chesapeake Bay changes 
in land use (runoff, sedimentation and nitrification) and overfishing 
of oysters in the 19th and 20th centuries contributed to shift from 
a highly productive estuarine ecosystem with thriving oyster, crab 
and fish fisheries, to a eutrophic, oxygen-depleted, bacterially dom- 
inated system®’. Likewise, overfishing of North Atlantic cod sim- 
ilarly resulted in a shift from a fish (cod)-dominated system to one 
dominated by invertebrates (shrimps, crab and lobster®”®). In the 
Caribbean, coral reefs collapsed after centuries of overfishing and pol- 
lution compounded by warming, coral bleaching, disease and invasive 
species, with widespread replacement of corals by macroalgae*oh™*. 
In each case, the new structure seems to be an alternative stable state, 
as extensive management efforts have been unable to restore historic 
ecosystem structure™®. 

The fossil record likewise documents examples of profound ecosys- 
tem change owing to shifts in the relative abundance (not just presence 
or absence) of taxa, including many of the turnovers in dominant reef 
builders®, the rise of angiosperms® and C4-grasses™, and during past 
biodiversity crises (see discussion below). In short, there is no a priori 
reason to believe that the extirpation of species drives observed ecosys- 
tem changes at mass extinction boundaries—global mass rarity may be 
as plausible a mechanism for ecosystem change as mass extinction. This 
being the case, we suggest that the extent of mass rarity might be the best 
metric for comparing the current crisis to those in the fossil record. 


The kill mechanism need only make the common rare 
Although palaeontologists have focused on extinction more than 
rarity, they have identified unusual phenomena associated with rarity 
during mass extinction episodes. Rarity matters because geographically 
or numerically restricted taxa typically have a relatively small probability 
of being preserved in the fossil record, or being recovered by palaeon- 
tologists”’. A species that undergoes a drastic reduction in population 
size, or contraction in range size, can thus appear to be ‘extinct’ in the 
fossil record, until that population either recovers, or eventually dies out 
entirely’), 

Species that disappear from the fossil record—sometimes repeatedly, 
and often for millions of years—only to subsequently reappear are called 
‘Lazarus’ taxa’*. Such taxa are known from each of the Big Five mass 
extinctions boundaries”. They include a variety of clades with high 
preservation potential, such as molluscs across the PT extinction”, bra- 
chiopods across the Ordovician-Silurian” and KPg”® extinctions, and 
ostracods across the late Devonian extinction’®. Outside of extinction 
boundaries, once-abundant taxa can also vanish from the fossil record 
for 10°-10° years without extinction, owing to rarity. Striking examples 
include the coelacanth fishes (currently extant; ~70 million year fossil 
gap’’) and the once widely abundant marine algae Cyclagelosphaera 
(currently extant; 54 million year fossil gap”*). 

Another example of extinction-related rarity is found in species that 
persist in low numbers through an extinction interval before dying out 
in the aftermath—a phenomenon known as ‘Dead Clades Walking’”?*. 
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Figure 2 | The sequence of taxonomic and ecosystem events across 
extinctions is unclear. Extinction intervals have four recognized phases 
(at the top: pre-extinction, extinction, recovery, post-extinction), based 
on the richness of fossils preserved. The relationship between fossil 
diversity and changes in ecosystem structure and function is unclear and 


A frequently cited case is that of bellerophontid gastropods after the PT 
extinction®!. More generally, an estimated 10-20% of the genera sur- 
viving extinction intervals die out before global biodiversity recovers”. 
For other taxa we might imagine that the sudden loss of fossils across 
a boundary is driven by extinction or by persistent rarity. For the sec- 
ond case, rarity and range contractions at extinction boundaries can be 
followed by eventual extinction, long disconnected from the last fossil 


occurrence. 
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Figure 3 | The geological brevity of mass extinctions makes it difficult 
to discern the relative importance of various processes. Mass extinction 
intervals are geologically instantaneous, making it difficult to measure the 
processes responsible for determining the size and ecological impact of 
any event. Three major extinction interval scenarios are (top) scenario 1: 
the primary extinction trigger directly kills off the pre-extinction taxa, 
with the size and impact of extinction determined by trigger; (middle) 
scenario 2: the extinction trigger kills key taxa (or environmental 
resources) with feedbacks leading to secondary extinctions; or (bottom) 
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Primary extinction 


may precede, coincide with, or follow the lost fossil diversity (blue solid 
to dashed line). A wide variety of palaeontological phenomena (grey 
boxes) document pervasive rarity as a feature of past mass extinctions. 
Most are widely accepted phenomena, with only the evidence for lowered 
productivity still debated within and among events*°*”-!. 


Three final attributes of past mass extinctions support the hypothesis of 
pervasive mass rarity. These features include the short-lived dominance 
of post-extinction taxa, the rarity of previously widespread habitats, and 
evidence for decreased primary productivity in the wake of extinctions. 
Those species that dominate assemblages immediately after extinctions 
are knowas ‘bloom taxa’!®. They have been recognized from the major, as 
well as many minor, extinction events!°””!*§3. The ecological success 
of post-extinction dominants in the unusual ecosystems characterizing 
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scenario 3: the trigger makes many species rare, many of which go extinct, 
and when abundant populations recover, the ecosystem, by chance, is 
structured differently. In scenarios 2 and 3 the decreased abundance in 
key taxa is sufficient to diminish their ecological effect (they become 
ecological ghosts) and precipitates further ecosystem collapse through 
secondary extinction and feedbacks. Also note that the primary trigger 
can be called the ‘kill-mechanism and include multiple coincident 
disturbances. 
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extinction aftermaths coincides with the prolonged rarity of all other 
taxa!©83,84 At the same time, pre-extinction habitats themselves often 
become rare or altered, as revealed by changes in the composition, 
continuity and texture of common sedimentary rock types”™”**°. In addi- 
tion, the rate of sediment accumulation is often much lower during and 
after the extinction interval (for example, prolonged low sedimentation 
after the PT®), a feature due at least in part to the low abundance of fos- 
sil-forming organisms (as for pelagic sediments after the KPg*”). This, and 
other lines of evidence***”-*, have been used to argue for some suppres- 
sion of primary productivity in the aftermath of extinctions—although 
to what extent this is true is still hotly debated””*'. Regardless, these lines 
of evidence indicate that pervasive rarity of formerly abundant taxa is 
unifying feature of extinctions and their aftermaths. 

This evidence for mass rarity during past extinction events is surpris- 
ingly similar to the widespread rarity of previously common flora and 
fauna today. The modern ocean is full of ecological ‘ghosts’—taxa that 
are so rare they no longer provide past ecological services****°*?, Mass 
rarity includes local, often remarkable, declines in species abundance, 
as well as range contractions (as reviewed in refs 38 and 44). For those 
species with excellent historical and fossil records, like Caribbean corals, 
the recent population collapse contrasts with the marked resilience to 
past climatic perturbation*®**">, What's more, the loss of species abun- 
dance is known to, at times, have cascading effects on ecosystem structure 
and function®, and extinction debt may cause extinction hundreds” to 
millions” of years after an environmental perturbation. In this light, the 
paucity of extinctions in the oceans to date should not be viewed as a sign 
of the relative health of marine ecosystems'!**—rarity itself may be the 
most direct metric of how close global ecosystems are to a permanent 
state shift. 


Saving the fossil record of today 

The effect of humanity is so pervasive***®” that we are leaving a globally 
recognizable mark in the rock record”*””, Some scientists are seeking to 
formally recognize this moment as the ‘Anthropocene’!””!°!defining 
it as the epoch of human-dominated earth systems”®®. As we consider 
humanity's effect on the biosphere, we must recognize that this history is 
still being written in stone and it remains ours to shape. Thus our hypoth- 
esis of past mass extinctions as mass rarity events offers a to-do list for 
avoiding the ecological aftermath of catastrophic and global biotic crises. 

For ecologists and conservation biologists, we have argued that, on 
timescales comparable to those studied today, past mass extinction events 
may have been characterized by the geologically instantaneous mass rarity 
of previously abundant, widespread, well-preserved species. This argu- 
ment is supported by the nature of the rock record, in which the observed 
presence or absence of a fossil species depends as much on its abundance 
as its existence. The rarity of previously common taxa is the only factor 
tied with certainty to the profound ecological change observed across 
extinction boundaries. And rarity alone may be enough to drive per- 
manent shifts in the earth system—long before ‘rare’ turns into ‘extinct. 
Because of this, we argue that changes in the abundance and ranges of 
previously common taxa provide an additional, potentially more accurate, 
metric of the severity of the current biotic crisis relative to those in the 
past than do extrapolated extinction rates. 

To date, the majority of extinction studies have been biased towards ter- 
restrial species and charismatic megafauna'”!? and we know relatively 
little about changes in the abundance and ranges of the shelly marine 
invertebrates that would provide a direct link to mass extinctions in the 
fossil record'™, Rarity of previously common taxa matters. In order to 
avoid a mass-extinction-like fossil record, we need to increase the pop- 
ulation size and geographic range of once-abundant taxa and trophic 
groups (that is, reverse defaunation and defloration) and minimize the 
geographic extent of habitat destruction. 

From custodians of deep time!, we need quantitative assessments 
of the fossil record of the present and future earth in order to accurately 
size up current biotic changes with the same filter through which we see 
the past. Equally important will be studies of the dynamics and resilience 
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of full ecological networks (not just trophic food webs) during massive 
perturbations. Spatially explicit models of the various extinction scenarios 
(Fig. 3) would likewise aid in distinguishing among the potential mech- 
anisms at play during mass extinctions'*. Ongoing efforts to build pal- 
aeontological data archives!” and to collect finely resolved records from 
extinction boundaries!*”! are likewise key, as they provide the means 
to globally test emergent predictions on relevant timescales and key pro- 
cesses, like geographic rarity, on others!°”'©*. Finally, the fossil record 
offers numerous examples of ecosystem change with and without fossil 
extinctions!”!!°, How and why this occurs is a key question to address 


if we are to predict, and avoid, a state shift in the structure and function 


of our biosphere in the years to come!’ Although extinctions are rare“, 


the ecological ghosts of oceans past already swim in emptied seas!)!!". 
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Growth and splitting of neural sequences 
in songbird vocal development 


Tatsuo S. Okubo!, Emily L. Mackevicius!, Hannah L. Payne’, Galen F. Lynch! & Michale S. Fee! 


Neural sequences are a fundamental feature of brain dynamics underlying diverse behaviours, but the mechanisms by 
which they develop during learning remain unknown. Songbirds learn vocalizations composed of syllables; in adult 
birds, each syllable is produced by a different sequence of action potential bursts in the premotor cortical area HVC. 
Here we carried out recordings of large populations of HVC neurons in singing juvenile birds throughout learning to 
examine the emergence of neural sequences. Early in vocal development, HVC neurons begin producing rhythmic bursts, 
temporally locked to a ‘prototype’ syllable. Different neurons are active at different latencies relative to syllable onset to 
form a continuous sequence. Through development, as new syllables emerge from the prototype syllable, initially highly 
overlapping burst sequences become increasingly distinct. We propose a mechanistic model in which multiple neural 
sequences can emerge from the growth and splitting of a common precursor sequence. 


Sequences of neural activity have been observed during various behav- 
iours, including navigation’, short-term memory” -7 decision mak- 
ing*?, and complex movements!*"!, suggesting that neural sequences 
are a fundamental form of brain dynamics’*, However, the circuit 
mechanisms underlying the generation of neural sequences and their 
development during learning are not well understood. 

The songbird is a good model system to address such questions 
because the song produced by adults is learned during develop- 
ment!*!8. Furthermore, adult song is associated with neural sequences 
in nucleus HVC'?4, a premotor cortical area necessary for the produc- 
tion of stereotyped adult song*>*°. Most projection neurons in HVC 
generate a brief burst of spikes at one specific time in the song motif and 
different neurons are active at different times in the song 19-2430, thus, 
distinct syllable types are produced by largely non-overlapping neural 
sequences in HVC. Here we ask how these different neural sequences 
are constructed during vocal development. 

Zebra finches acquire their stereotyped song through a gradual 
learning process'**". Young birds initially produce a highly variable 
‘subsong’’, akin to human babbling"». Birds then enter the protosyll- 
able stage as they begin to incorporate syllables of a characteristic 
~100 ms duration**. This is followed by the gradual emergence of 
multiple syllable types*”***9, and a final ‘motif’ stage in which syll- 
ables are produced in a reliable sequence. While HVC activity is not 
required for subsong””**"», it is required for song components in all 
later stages, including protosyllables, emerging syllable types, and 
adult song”? *8*"9, 


Developmental progression of HVC activity 

To elucidate the mechanisms by which neural sequences in HVC 
develop, we recorded from populations of HVC projection neurons 
in juvenile and adult birds (n = 1,149 neurons, 35 birds; Extended Data 
Fig. 1a). At all stages of vocal development, HVC projection neurons 
generated brief bursts of spikes during singing (Fig. la—c, Extended 
Data Fig. 1b, c). In the subsong stage (n = 12 birds; defined by expo- 
nential distribution of syllable durations, before the emergence of pro- 
tosyllables) roughly half the neurons generated bursts not temporally 
locked to syllable onsets (Extended Data Fig. 1d), while the other half 
produced bursts that tended to occur at a particular latency relative 


to subsong syllable onsets (Fig. la and Extended Data Fig. le-i; 19/39 
neurons exhibited syllable locking). The fraction of neurons locked to 
syllable onsets exhibited a gradual and significant increase through- 
out vocal development (Fig. 1f; correlation with song stage: r= 0.22, 
P< 107"; see Methods) until, in adult birds, virtually every projection 
neuron generated bursts precisely locked to syllables, as previously 
described'?**, 

Song development is characterized by a gradual change in song 
rhythm**7"8, The subsong stage, which has little evidence of rhythmic 
song structure, ends with the emergence of a rhythmically produced 
protosyllable (5-10 Hz)**-*>. This is followed by a subsequent increase 
in the period between repetitions of the same sound, attributable to 
the addition of new song syllables**. HVC exhibited parallel changes 
in rhythmicity. In the subsong stage, most projection neurons did not 
burst rhythmically (Fig. la, f; 3/39 neurons were rhythmic). In the 
protosyllable stage, roughly half of the projection neurons generated 
rhythmic bursts (5-10 Hz) (Fig. 1b, f; 70/135 neurons were rhythmic; 
period 169+ 6.4ms, mean +s.e.m.). Such bursts were typically locked 
to rhythmic protosyllables, but were also commonly observed during 
portions of the song with less rhythmic syllable onsets, particularly 
early in the protosyllable stage (Extended Data Fig. 2a—d). On average, 
both the fraction of rhythmic HVC neurons and the period of the 
HVC burst rhythm gradually increased during the emergence of new 
syllable types and the formation of the song motif (Fig. 1f, g; correla- 
tion between song stage and fraction of rhythmic neurons: r=0.28, 
P< 107"; correlation between song stage and period of burst rhythm: 
r=0.57, P<107"). 

A substantial fraction of projection neurons (285 of 1,117 neurons) in 
juvenile birds generated bursts related to song bouts—defined as epochs 
of continuous singing bounded by periods of silence (see Methods). 
Bout-related neurons generated brief bursts of spikes immediately 
before bout onset (‘bout-onset’ neurons; 137/285 neurons) or after 
bout offset (98/285 neurons) (Fig. 1d, e and Extended Data Fig. 2e-]; 
an additional 50/285 neurons were active both before and after bouts). 


Growth of a neural protosequence 
We next wondered how the activity of HVC projection neurons is 
coordinated across the neural population during protosyllables. Multiple 
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Figure 1 | Singing-related firing patterns of HVC projection neurons 
in juvenile birds. a, Neuron recorded in the subsong stage, before the 
formation of protosyllables (RA-projecting HVC neuron, HVCaa; 51 dph; 
bird 7). Top, song spectrogram with syllables indicated above. Bottom, 
extracellular voltage trace. b, Neuron recorded in the protosyllable stage 
(HVCaa; 62 dph; bird 2). Protosyllables indicated (grey bars). c, Neuron 
recorded after motif formation (HVCaga; 68 dph; bird 8). d, Neuron 
bursting exclusively at bout onset (X-projecting HVC neuron, HVC; 

61 dph; bird 2). e, Neuron bursting exclusively at bout offset (HVCga; 

65 dph; bird 2). f, Developmental change in the fraction of neurons locked 
to syllable onsets (grey) and fraction of neurons with rhythmic bursting 
(black) (mean + s.e.m.; n= 39, 135, 565, 378 and 32 neurons, respectively). 
g, Mean period of the HVC rhythmicity as a function of song stage 

(n=3, 70, 356, 298 and 25 neurons, respectively). *** P< 0.001, post-hoc 
comparison with the adult stage. Spectrogram vertical axis 500-8,000 Hz. 
Scale bars for panels a-c, 0.5 mV, 200 ms; panels d-e, 1 mV, 500 ms. Inset in 
panels a—c show zoom of bursts indicated by an asterisk; scale bar, 5 ms. 


recordings in the same bird revealed that different neurons were 
active at different times with respect to protosyllable onsets (Fig. 2a, b 
and Extended Data Figs 1n and 9k; n=3 birds, 54 neurons), with laten- 
cies spanning the duration of the protosyllable and the intervening 
gap (>90% burst coverage; Extended Data Fig. 2t). These findings 
suggest that protosyllables are generated by a rhythmic protose- 
quence—a repeating motor program comprised of a continuous 
sequence of bursts in HVC. 

We next examined the developmental emergence of this rhythmic 
protosequence. In the subsong stage (Fig. 2c; = 19 neurons, 12 birds), 
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Figure 2 | Rhythmic sequences in HVC during the protosyllable stage. 
a, Three neurons recorded from bird 2 during protosyllable stage (top: 
HVCx; 63 dph; bottom: simultaneous recording two neurons; both HVCx; 
64 dph; scale bar, 0.5 mV). b, Raster plot of 28 HVC projection neurons 
aligned to protosyllable onsets (sorted by latency; 57-64 dph, bird 2). 
Antidromically identified HVCr, neurons indicated by circles at right. 

c, Distribution of burst latencies relative to syllable onset in subsong 

stage (top), protosyllable stage (middle), and multi-syllable/motif stages 
(bottom), across all birds (n= 19, 104 and 814 neurons, respectively). 
Black triangles indicate median burst times. 


bursts had a significantly earlier distribution of latencies compared to 
the broader distribution of burst latencies in the protosyllable stage 
(n= 104 neurons, 13 birds; P= 0.02; 63% versus 43% of bursts before 
syllable onset in the subsong stage and protosyllable stage, respec- 
tively). Even though the range of latencies was narrower in subsong 
birds, different neurons recorded in the same bird were locked to 
syllable onsets at different latencies (Extended Data Fig. 1f-i). This 
suggests the existence of transient sequential activity, initiated just 
before syllable onset, but decaying within a few tens of milliseconds. 
This sequential activity appears to grow during the protosyllable stage 
to form longer sequences that can persist for more than a hundred 
milliseconds, throughout the duration of the protosyllable (Fig. 2b, c). 


Sequence splitting during syllable formation 

We next wondered how distinct sequences in HVC, each corresponding 
to a distinct adult syllable type, emerge during vocal learning. Here we 
hypothesize that new syllable types can emerge by the gradual split- 
ting ofa single protosequence. In this view, we imagine that the neural 
sequences underlying newly emerging syllable types would initially be 
largely overlapping, with neurons shared across the emerging syllables. 
Splitting would be associated with an increasing number of neurons 
selective for a particular emerging syllable type, and a decreasing frac- 
tion of shared neurons. 

To test this hypothesis, we recorded from HVC projection neu- 
rons (n = 769) in 6 juvenile birds while they acquired multiple syl- 
lable types. As a first example, we will describe changes in the HVC 
population activity in a bird (n =375 projection neurons; bird 1) that 
developed two acoustically distinct syllable types (labelled 3 and -y) 
over the course of several days (Fig. 3a, b; 8 and y eventually form 
adult syllables B and C, respectively). During the protosyllable stage 
(56-59 days post-hatch, dph), the majority of projection neurons partic- 
ipated in a rhythmic protosequence (Extended Data Fig. In; n= 14/16 
neurons; for example, Fig. 3c). After the emergence of syllable types 
6B and ¥ (62-72 dph), many neurons were selectively active only during 
6B or during +, but not both (Fig. 3d, f; of 105 neurons active during 
either 6 or y, 41 were B-specific and 42 were \-specific). The bursts of 
these syllable-specific neurons exhibited a wide range of latencies, with 
spiking activity of neurons in each group spanning the entire dura- 
tion of each syllable (Fig. 3g). Notably, we also observed a substantial 
population of neurons that were significantly active during both 3 
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and ‘ (n = 22 ‘shared’ neurons; Fig. 3e—g). Simultaneous recordings 
revealed the co-occurrence, in different neurons, of shared and specific 
firing patterns (Fig. 3f, Extended Data Fig. 3a, b). 

Shared neurons exhibited a number of striking characteristics. 
These neurons burst rhythmically with the same inter-burst interval 
as neurons recorded in the protosyllable stage (Fig. 3e, f; Extended 
Data Fig. 3f-j). Shared neurons were active, as a population, at a 
wide range of latencies within emerging syllables (Fig. 3g), and 
crucially, for a given shared neuron, the bursts during 8 occurred 
at a similar latency as the bursts during - (Fig. 3g, Extended Data 
Fig. 4a—d). Thus, the population of shared neurons generated the 
same continuous burst sequence during both 8 and +. This shared 
sequence occurred even at times when there was a significant acoustic 
difference between the shared syllables (Extended Data Fig. 5). We 
also found that the fraction of shared neurons later in development 
(81-112 dph) was significantly lower compared to the earlier record- 
ings (Fig. 3h; 10 shared and 90 specific neurons; P= 0.03). Thus, the 
refinement of 8 and + into the adult syllables B and C coincides with 
a decrease in the fraction of shared neurons, producing a gradual 
splitting of these representations into increasingly non-overlapping 
‘daughter’ neural sequences. 

The tendency of bird 1 to alternate between syllables 3 and ~) means 
that syllable-specific neurons had an inter-burst interval, and thus a 
period, that was twice as long as that observed in the earlier protosyl- 
lable stage (Fig. 3c-f, Extended Data Fig. 3f-}). Therefore, the increase 
in the period of neural activity through skipping or alternating cycles 
of an underlying rhythm seems to be a basis for the increase in song 
period during vocal learning*’. 

Although our key findings are described above for bird 1, a similar 
pattern of HVC coding by shared and specific neurons was seen ina 
total of 6 birds for which recordings were made during the emergence 
of multiple syllable types (birds 1-6; 185 shared neurons and 496 spe- 
cific neurons for 8 syllable pairs analysed). Across three birds in which 
neurons were also recorded in later song stages, there was a significant 
decrease in the fraction of shared neurons during syllable development 
(n=5 syllable pairs; P=3 x 10~°; birds 1, 2 and 4). Neurons exhibiting 
an increased burst period by skipping cycles of an underlying rhythm 
were observed in 4 of the 6 birds (birds 1, 3, 4 and 6). 


Splitting in other learning strategies 

Behavioural studies have shown that new syllable types can emerge 
using several distinct developmental strategies***?"°?, The bird 
described above (bird 1) used the ‘serial repetition’ strategy” and 
‘sound differentiation in situ’*? to develop two new syllables by alter- 
nating increasingly different variants of the protosyllable. Alternatively, 
birds can acquire multiple syllables simultaneously to form an entire 
motif (‘motif strategy’)”, or form new syllables at bout edges (onset or 
offset)?**°. We wondered if the splitting of neural sequences underlies 
these other strategies as well. 

Neural recordings were obtained in three birds (birds 1, 2 and 5) 
that exhibited bout-onset syllable formation. We focus here on bird 2 
in which projection neurons were recorded throughout song devel- 
opment (57-84 dph). Tracking of syllable structure (Extended Data 
Fig. 6) revealed that syllables A and B of the adult song derived from a 
common, rhythmically repeated protosyllable (labelled a; Fig. 4a, b), 
and that syllable B arose from the first repetition of « at bout onset 
(Fig. 4c, d). The bout-onset syllable emerged as a distinct syllable type 
(labelled 8) by fusion of this first « with a brief vocal element ¢ at bout 
onset (Fig. 4c, d and Extended Data Fig. 6a-e). 

To examine the neural mechanisms underlying the emergence of 
the new syllable 8 at bout onsets, we analysed the firing patterns of 
125 HVC projection neurons. Before the emergence of syllable (3, the 
majority of recorded projection neurons participated in a rhythmic 
protosequence (Fig. 2b; n = 28/35 neurons; 57-64 dph). A different 
subset of neurons was active at bout onsets (Fig. 4c; 4 of 35 neu- 
rons). After the reliable emergence of 3 at bout onsets, roughly half 
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Figure 3 | Shared and specific sequences during the emergence of 
multiple syllable types. All data are from bird 1. a, Song examples during 
the emergence of syllables 6 (red) and x (blue). Panels show, from top to 
bottom, subsong stage (46 dph), rhythmic repetition of protosyllable a 
(grey bars; 58 dph), rhythmic repetition of variants of the protosyllable 

(8 and; 60 dph), and further acoustic differentiation of 8 andy (red and 
blue bars; 62 dph). b, Scatter plot of syllable duration versus mean pitch 
goodness (each dot is one syllable rendition; n = 400 syllables per day; 
unclassified syllables grey). c, Neuron recorded during protosyllable stage 
(HVC; 56 dph). d, 3-specific neuron (HVCx; 64 dph). e, Shared neuron 
active during both 8 and y (HVCaa; 68 dph). f, Simultaneously recorded 
pair of HVCx neurons: shared neuron (top) and y-specific neuron 
(bottom; 71 dph). g, Raster of 105 projection neurons early in syllable 
differentiation showing shared and specific sequences. HVCra neurons 
indicated by circles at right. h, Same as g but for 100 neurons recorded 
after differentiation of 8 and ¥ into adult syllables B and C. Scale bars for 
panels c-f, 0.5 mV, all have the same time scale. 
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Figure 4 | Shared and specific sequences during the emergence of a new 
syllable at bout onset. All data are from bird 2. a, Schematic of syllable 
formation. b, Scatter plot of mean pitch goodness of syllables « (red) 

and B (blue) through development (n= 100 syllables per day; horizontal 
jitter added to improve data visibility). c, Bout-onset neuron active before 
element ¢ (HVCa; 64dph). d, New syllable 8 formed by fusion of ¢ and 
a. Neuron shared between a and 8 (HVCga; 65 dph). e, Neuron shared 
between a and 3 (HVCx; 70 dph). f, A-specific neuron (HVCaga; 80 dph). 
g, B-specific neuron (HVCRa; 73 dph). h, Population raster plot of 43 
projection neurons recorded early in the emergence of syllable 8 showing 
shared and specific sequences. i, Raster plot of 32 neurons recorded after 
differentiation of 8 and «a into adult syllables B and A. Scale bars for panels 
c-g, 0.5 mV, all have the same time scale. 


of projection neurons generated bursts during both syllables a and 8 
(65-72 dph; Fig. 4d, e; 1 =22 ‘shared’ neurons; 21 ‘specific’ neurons). 
These shared neurons produced nearly identical sequences during 
these two syllables (Fig. 4h, Extended Data Fig. 4c). Later in song 
development (73-84 dph), we observed a smaller fraction of shared 
neurons (n =4 ‘shared’ neurons; P=5 x 1074), anda correspond- 
ingly larger fraction of syllable-specific neurons (Fig. 4f g, i; n =28 
‘specific’ neurons), consistent with a gradual splitting of the proto- 
sequence into increasingly non-overlapping ‘daughter’ sequences. 
Evidence for sequence splitting during bout-onset differentiation was 
also observed in birds 1 and 5 (Extended Data Fig. 7). 

Note that the bout-onset differentiation in bird 1 occurred after the 
earlier emergence of the syllables 8 and 4 (Fig. 3), suggesting that new 
syllables may emerge in a hierarchical process—that is, by the splitting 
of sequences that are themselves the product of an earlier splitting 
process (Extended Data Fig. 7). 
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We were able to examine the question of whether neural sequence 
splitting also underlies the ‘motif strategy’ of song learning in two 
birds (birds 3 and 4; Extended Data Figs 8 and 9). In both birds, neural 
recordings showed the existence of rhythmically bursting neurons in 
the protosyllable stage (Extended Data Figs 8e and 9e, f). After the 
emergence of multiple syllable types, every syllable in the emerging 
motif had at least one neuron that was shared with another syllable at 
similar latencies (Extended Data Figs 8f-j and 9g-o), consistent with 
the view that all of these syllables arose from the simultaneous splitting 
of a common protosequence. 


Mechanistic model and discussion 

We propose a mechanistic model of learning in the HVC network to 
describe how sequences emerge during song development. This model 
is based on the idea that sequential bursting results from the propaga- 
tion of activity through a continuous synaptically connected chain of 
neurons within HVC?*!~*”, It also captures non-uniformities such as 
increased burst density at syllable onsets, as formulated in a perspective 
of HVC function emphasizing vocal gestures”. 

Modelling studies have shown that a combination of two synap- 
tic plasticity rules—spike-timing dependent plasticity (STDP) and 
heterosynaptic competition—can transform a randomly connected 
network into a feedforward synaptically connected chain that gener- 
ates sparse sequential activity**“*, We hypothesize that the same mech- 
anisms can drive the formation of a rhythmic protosyllable chain, and 
subsequently split this chain into multiple daughter chains for different 
syllable types. To test this hypothesis, we constructed a simple network 
of binary units representing HVC projection neurons“. 

The model neurons are initially connected with random excitatory 
weights, representing the subsong stage. We hypothesize that a subset 
of HVC neurons receives an external input at syllable onsets and serves 
as a seed from which chains grow during later learning stages***. 
Before learning, activation of these seed neurons produced a tran- 
siently propagating sequence of network activity that decayed rapidly 
(within tens of milliseconds; Fig. 5a). 

In the next stage, the network is trained to produce a single proto- 
syllable by activating seed neurons rhythmically (100 ms period). The 
connections are modified according to the learning rules described 
above**4, As a result, connections were strengthened along the 
population of neurons sequentially activated after syllable onsets, 
resulting in the growth of a feedforward synaptically connected chain 
that supported stable propagation of activity (Fig. 5b). 

We found that this single chain could be induced to split into 
two daughter chains by dividing the seed neurons into two groups 
that were activated on alternate cycles of the rhythm (Fig. 5c, d and 
Supplementary Video 1). Local inhibition*® and synaptic competi- 
tion were also increased (see Methods). During the splitting process, 
we observed neurons specific to each of the emerging syllable types, 
as well as shared neurons that were active at the same latencies in 
both syllable types (Fig. 5c). Just as observed in our data, over the 
course of development the distribution of burst latencies in the model 
continued to broaden (Fig. 5e), and the fraction of shared neurons 
decreased (Fig. 5c, d). The average period of rhythmic bursting in 
model neurons increased during chain splitting as neurons became 
‘specific for one emerging syllable type and began to participate only 
on alternate cycles of the protosyllable rhythm (Fig. 5d and Extended 
Data Fig. 10g, h). 

Our model can reproduce other strategies by which birds learn 
new syllable types. We implemented bout-onset differentiation in the 
model by also including a population of seed neurons activated at 
bout onsets (see Figs 1d and 4c, and Extended Data Fig. 10a). This 
caused the protosyllable chain to split in such a way that one daugh- 
ter chain was reliably activated only at bout onsets, while the other 
daughter chain was active only on subsequent syllables (Extended Data 
Fig. 10a—d and Supplementary Video 2). Our model was also able to 
simulate the simultaneous emergence of a three-syllable motif (‘motif 
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Figure 5 | A neural model of sequence formation and splitting in HVC. 
a-d, Top, network diagrams of participating neurons (darker lines indicate 
stronger connections; magenta boxes indicate seed neurons). Bottom, 
raster plot of neurons showing shared and specific sequences. Neurons 
sorted by relative latency. Magenta arrows indicate groups of seed neurons. 
a, Subsong stage: activation of seed neurons produces a rapidly decaying 
burst of sequential activity. b, Protosyllable stage: rhythmic activation of 


strategy’) by dividing the seed neurons into three subpopulations 
(Extended Data Fig. 10e-h). 

Our data and modelling support the possibility of syllable formation 
by mechanisms other than sequence splitting. For example, in several 
birds, a short vocal element emerged at bout onsets that did not seem 
to differentiate acoustically from the protosyllable (and thus was not 
bout-onset differentiation; for example, ‘E in bird 1, Extended Data 
Fig. 7a; or ‘C’ in bird 2, Extended Data Fig. 6a, b). We found that, 
by using different learning parameters, our model allows bout-onset 
seed neurons to induce the formation of a new syllable chain at bout 
onset, rather than inducing bout-onset differentiation (Extended Data 
Fig. 10i-k). 

In summary, our model of learning in a simple sequence-generating 
network captures transformations that underlie the formation of new 
syllable types via a diverse set of learning strategies. 


Possible role of sequence splitting 
The process of splitting a prototype neural sequence allows learned 
components of a prototype motor program to be reused in each of the 
daughter motor programs. For example, one of the earliest aspects 
of vocal learning is the coordination between singing and breath- 
ing®®, specifically, the alternation between vocalized expiration and 
non-vocalized inspiration typical of adult song’. The protosequence 
in HVC would allow the bird to learn the appropriate coordination of 
respiratory and vocal musculature. Duplication of the protosequence 
through splitting would result in two ‘functional’ daughter sequences, 
each already capable of proper vocal/respiratory coordination, and 
each suitable as a substrate for rapid learning of a new syllable type. 
This proposed mechanism resembles a process thought to underlie 
the evolution of novel gene functions: gene duplication followed by 
divergence through independent mutations”. Similarly, for the acqui- 
sition of complex behaviours, the duplication of neural sequences by 
splitting, followed by independent differentiation through learning, 
may provide a mechanism for constructing complex motor programs. 
Online Content Methods, along with any additional Extended Data display items and 
Source Data, are available in the online version of the paper; references unique to 
these sections appear only in the online paper. 
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seed neurons induces formation of a protosyllable chain. c, Alternating 
activation of red and blue seed neurons and synaptic competition drives 
the network to split into two chains (specific neurons, red and blue; shared 
neurons, black). d, Network after chain splitting. e, Distribution of model 
burst latencies during subsong, protosyllable stage and chain splitting stage 
(early and late combined). 
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METHODS 


Animals. We used juvenile male zebra finches (Taeniopygia guttata) 44-112 
days post-hatch (dph) singing undirected song (n = 32 birds). Animals were not 
divided into experimental groups; thus, randomization and blinding were not nec- 
essary. No statistical methods were used to predetermine sample size. Birds were 
obtained from the Massachusetts Institute of Technology zebra finch breeding 
facility (Cambridge, Massachusetts). The care and experimental manipulation 
of the animals were carried out in accordance with guidelines of the National 
Institutes of Health and were reviewed and approved by the Massachusetts Institute 
of Technology Committee on Animal Care. 

All the juvenile birds were raised by their parents in individual breeding cages 
until 38 + 5.2 dph (mean + s.d.) when they were removed and were singly housed 
in custom-made sound isolation chambers (maintained on a 12:12h day-night 
schedule). For a subset of the birds (birds 1, 2 and 4), additional tutoring was car- 
ried out after removal from the breeding cages to facilitate song imitation. This was 
done by playback of the tutor song through a speaker (20 bouts per day). Additional 
tutoring was done for 12 days for bird 1, 7 days for bird 2, and 18 days for bird 4. 
Bird identification key: bird 1, to3965; bird 2, to3779; bird 3, to3017; bird 4, 
to5640; bird 5, to3396; bird 6, to2309; bird 7, to3412; bird 8, to3567; bird 9, to2462; 
bird 10, to2331; bird 11, to2427; bird 12, to3352. 

To compare the activity of HVC projection neurons in juvenile birds with that 
of adult birds, we also included neurons recorded in adults (>120 dph, n= 3 birds) 
which included a reanalysis of previously published HVC recordings performed 
in adult male zebra finches singing directed song”’. 

Song recordings. Songs were recorded with Sound Analysis Pro®! or a 
custom-written MATLAB software (A. Andalman), which was configured to 
ensure triggering of recordings on all quiet vocalizations of juvenile birds*’. The 
vertical axis range for all spectrograms is 500-8,000 Hz. 

Classification of song stages. We classified each day of juvenile singing into one of 
four song stages: subsong stage, protosyllable stage, multi-syllable stage, and motif 
stage (Extended Data Fig. 1a). Subsong stage (48 + 4 dph, median + inter-quartile 
range, IQR) is defined as having a syllable duration distribution well-fit 
by an exponential distribution***°, with an upper limit for the Lilliefors 
goodness-of-fit statistic of 6. Following the subsong stage, birds enter the protosyll- 
able stage (58 + 10 dph, median + IQR) characterized by the presence of syllables 
with consistent timing reflected in a peak in the distribution of syllable dura- 
tions*?-*°. The onset of the protosyllable stage was defined here as the first day in 
which the syllable duration distribution deviated from an exponential distribution 
(Lilliefors goodness-of-fit statistic greater than 6). Following the protosyllable stage, 
birds transition to the multi-syllable stage (62 + 12 dph, median + IQR) in which 
multiple distinct syllable types are visible in the song spectrogram and as multiple 
clusters in a scatter plot of syllable features” (for example, Fig. 3a, b; 62 dph). 
The motif stage (73 + 21 dph, median + IQR) was defined by the production of a 
sequence of syllables in a relatively fixed order*". Finally, songs recorded in birds 
older than 120 dph were assigned as adult stage. A slightly older cutoff than the 
typical definition of adulthood in zebra finches (~90 dph)"* was used, because some 
of our birds in the 90-120 dph range continued to undergo some small develop- 
mental changes, as has been reported*. 

Syllable segmentation and bout extraction. Syllable segmentation of the juvenile 
song was done based on the song power in a spectral band between 1 and 4 kHz, as 
described previously*”***. In a few cases, cutoff frequencies of the band-pass filters 
were adjusted to avoid the inclusion of high-frequency inspiratory sounds*>”’. 
Introductory notes were removed manually to avoid including HVC neurons that 
are rhythmically active during these elements”. Song bouts were defined as con- 
tinuous sequences of syllables separated by gaps no longer than 300 ms*°. Bout 
onset was defined as the onset of the first syllable in the bout, and bout offset was 
defined as the offset of the last syllable in the bout. 

Syllable segmentation based on the song rhythmicity (‘phase segmentation). 
For bird 3 (‘motif strategy’), it was difficult to segment syllables consistently using 
previous methods based on setting a threshold on the sound amplitude?”***°. 
To overcome this limitation, we segmented syllables based on the phase of the 
rhythmicity in the song (‘phase segmentation’). The peak of the song rhythm, 
defined as the spectrum of the sound amplitude during singing*’, exhibited a peak 
around 9 Hz (Extended Data Fig. 8c). To estimate the instantaneous phase of this 
rhythm, we first band-pass filtered the sound amplitude (Extended Data Fig. 8c, d; 
second-order IIR resonator filter with peak at 9 Hz and —3 dB half-bandwidth 
of 3 Hz; MATLAB command iirpeak). The band-pass filtered signal was then 
processed using the Hilbert transform (MATLAB command hilbert) to compute 
the instantaneous amplitude and phase (Extended Data Fig. 8d). Next, we set a 
threshold on this instantaneous amplitude to find the rhythmic part of the song. 
Finally, within this rhythmic part, song was segmented by detecting threshold 
crossings of the instantaneous phase (Extended Data Fig. 8d, bottom). Phase 


segments that contain no sounds or calls were manually removed. Similarly, phase 
segmentation (band-pass filter with peak at 10 Hz and half-bandwidth of 3 Hz) 
was used to segment the song during the protosyllable stage for bird 4 (Extended 
Data Fig. 9a, e, f). Note that this method is best suited for segmenting songs that 
have strong rhythmic modulation of song amplitude, but in which syllable bound- 
aries are not strongly rhythmic. This appeared to be typical of birds employing 
the ‘motif strategy”. 

Syllable classification and labelling. Protosyllables were defined by their char- 
acteristic durations as has been described previously***>. In short, to identify the 
protosyllables, we first subtracted the best-fit exponential distribution (using 
200-400 ms) from the syllable duration distribution, and fitted a Gaussian distri- 
bution to this residual. Protosyllables were defined as syllables having durations 
within two standard deviations from the mean of this Gaussian distribution. We 
labelled protosyllables using the Greek letter ‘o’ in all our birds for consistency. 

To label the emerging syllables in the juvenile song, we used the Greek letters (3, 
+, 6, and. In contrast, to label the syllables in the adult motif, we used the capital 
letters of the Latin alphabet A, B, C, etc. For birds in which the song learning 
trajectory was tracked developmentally, we labelled the syllables such that the 
correspondence between the juvenile syllables and adult syllables is straightfor- 
ward: for example, « becomes A, 3 becomes B, ~j becomes C, 6 becomes D, and 
e becomes E. Note that this labelling scheme leads to a slightly unconventional 
labelling of adult song in the sense that a motif can have letters in a reverse order 
(for example, CBA in Fig. 4f, g; Extended Data Fig. 6a), or a motif might not have 
a syllable A (for example, EDCB in Extended Data Fig. 7a). 

Syllable labelling was done manually by visual inspection of the song spec- 
trogram; this was done blind with respect to the neural activity. The existence of 
multiple distinct syllable types were confirmed by calculating the syllable duration 
and acoustic features commonly used to analyse birdsong syllables*!*, and visual- 
izing the clusters of syllables in a two-dimensional space” (Fig. 3b, Extended Data 
Figs 8b and 9d). In some cases, syllable order was used as an additional indicator 
of syllable identity (for example, Extended Data Fig. 7a, 70 dph; Extended Data 
Fig. 8a, 51 dph; Extended Data Fig. 9a, 59 dph). 

In bird 1, syllables 8 and ¥ were labelled manually by visual inspection of 
the song spectrogram (Fig. 3a). Since characterizing shared neurons and specific 
neurons depends on the reliable labelling of syllables, we took a conservative 
approach and only labelled syllables that were clearly identifiable and did not 
label the syllables that were ambiguous (fraction of syllables labelled as 6 or + 
during 62-66 dph: 70 + 5.5%, mean + s.d.). We then estimated the error rate of 
our labelling procedure by plotting the labelled syllables (n = 200 syllables per 
type on each day) in a two-dimensional space of syllable duration and mean pitch 
goodness (Fig. 3b), and obtained a decision boundary using linear discriminant 
analysis. We used mismatch between manual labelling and feature-based labelling 
to estimate the error rate for syllables 3 and. The error rate during the first five 
days of syllable differentiation (62-66 dph), when the labelling was most difficult, 
was only 1.1% on average (range: 0.25-3.0%). 

For the second round of differentiation in bird 1, syllable order was used to 
assist in the labelling of syllables in early stages when syllables ‘B’ and ‘D’ were 
not easily distinguishable based on acoustic differences. Because these syllables 
underwent bout-onset differentiation, the first 3 after bout onset was labelled ‘D’; 
later renditions of 8 in the bout were labelled ‘B’ (Extended Data Fig. 7a). 

In bird 2, several emerging syllables could be easily distinguished based on 

syllable durations (Extended Data Fig. 6d). Specifically, syllables whose durations 
were 110-160 ms, and 180-250 ms were defined as « and 3, respectively. Syllables 
that were 10-75 ms in duration were labelled ¥ if they were followed by a 3, and 
labelled ¢ otherwise. 
Chronic neural recordings. Single-unit recordings of HVC projection neurons 
during singing were carried out using a motorized microdrive described previ- 
ously°°57, Single-units were confirmed by the existence of the refractory period in 
the inter-spike interval (ISI) distribution (Extended Data Fig. 1b). Neurons that 
were active only during distance calls and not during singing”’ were excluded 
from the analysis. In addition, neurons recorded for less than 5 s of singing were 
excluded since the short recording duration did not allow us to reliably quantify 
the activity pattern of these neurons. 

Antidromic identification of HVC projection neurons was carried out with a 
bipolar stimulating electrode implanted in RA and Area X (single pulse of 2001s 
every 1 s; current amplitude: 50-500 1.A)!?057-°°. A subset of antidromically 
identified projection neurons was further validated with collision testing!??°57-?. 
A different subset of single units were identified as putative projection neurons 
based on sparse bursting, but could not be antidromically identified because they 
did not respond to antidromic stimulation or were lost before antidromic identifi- 
cation could be carried out (211 of 1,149 neurons). These neurons were included 
in the data set as unidentified HVC projection neurons (HVC,). 
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Analysis of neural activity. Spikes were sorted offline using custom MATLAB 
software (D. Aronov). 

Definition of bursts. HVC projection neurons exhibited bursts of action poten- 
tials during singing (Fig. la—c). The bursting nature of these neurons was evident 
in the inter-spike interval (ISI) distribution during singing, which exhibited two 
peaks with an inter-peak minimum near 30 ms (Extended Data Fig. 1b). We 
defined a ‘burst’ as a continuous group of spikes separated by intervals of 30 ms 
or less. Thus, by definition, bursts are separated from other spikes by intervals 
greater than 30 ms. Note that single spikes separated by more than 30 ms from 
both the preceding spike and the following spikes were also counted as a burst. 
Burst time was defined as the centre of mass of all the spikes within the burst. 
Burst width was defined as the interval between the first and the last spike in a 
burst (Extended Data Fig. 1c, top). Firing rate during burst was defined as the 
reciprocal of the mean inter-spike interval in a burst (Extended Data Fig. 1c, 
bottom). For the calculation of burst width and firing rate during bursts, bursts 
composed of a single spike were excluded. 

Syllable-related neural activity. To analyse the temporal relation between neural 
activity and song syllables, we aligned the spike times to syllable onsets and con- 
structed a rate histogram (1 ms bin, smoothed over 20 bins; range: + 0.5 s from 
syllable onsets). The peak in this rate histogram was found between 50 ms before 
syllable onset and 200 ms after syllable onset. To test the significance of this peak, 
surrogate histograms were created by adding different random time shifts to the 
spike times on each trial®. Random time shifts were drawn from a uniform dis- 
tribution over + 0.5s. The peak of this surrogate histogram was recorded, and this 
shuffling procedure was repeated 1,000 times; P values were obtained by analysing 
the frequency with which the peaks of surrogate data were larger than that of the 
real data, and P< 0.05 was considered significant. 

To visualize the population activity associated with protosyllables, we con- 
structed a population raster plot by choosing 20 protosyllable renditions for which 
each neuron was most active. Different neurons were plotted in different colours 
(Fig. 2b, Extended Data Figs 1n and 9k). For all the other population raster plots 
associated with identified syllables, 20 random renditions were chosen for display. 
For all population raster plots, syllable duration from each rendition was linearly 
time-warped to the mean duration of the syllable. Spike times were warped by 
the same factor. 

Bout-related neural activity. A subset of HVC projection neurons exhib- 
ited bout-related activity: bursting before bout onsets and/or after bout offsets 
(Fig. 1d, e and Extended Data Fig. 2e-1). To quantify the pre-bout activity, we 
generated histograms aligned to bout onsets (Extended Data Fig. 2f, g) and found 
the peak in the histogram in a 300 ms window before bout onset. We considered 
a neuron to be exhibiting ‘pre-bout activity’ if the size of this peak was significant 
(P <0.05) compared to peaks obtained from the shuffled surrogate histograms 
(identical to the procedure described earlier in the section Syllable-related neu- 
ral activity). To eliminate the possibility of including syllable-related activity as 
bout-related activity, we did not consider a neuron to be exhibiting pre-bout activ- 
ity if the neuron showed a peak in the bout-onset aligned histogram and a peak at a 
similar latency (less than 25 ms apart) in the syllable-onset aligned histogram. We 
considered a neuron to be exhibiting ‘post-bout activity’ if there was a significant 
peak in the bout-offset aligned histogram (Extended Data Fig. 2j, k) in a 300 ms 
window after bout-offset. 

Quantification of the rhythmic neural activity. To quantify the rhythmic 
neural activity of HVC projection neurons, we used four different meth- 
ods: inter-burst interval, spike-train autocorrelation, spectrum of the spike 
train, and cepstrum of the spike train. Only spikes that were produced dur- 
ing singing (that is, between the onset of the first syllable and the offset of 
the last syllable in the bout) were used for the calculation of these measures. 
(1) Inter-burst interval. Intervals between burst times were calculated and 
the peak between 80-1,000 ms was found. (2) Spike-train autocorrelation. 
To quantify the second-order statistics of the firing pattern of HVC neurons, 
spike-train autocorrelation, expressed as a conditional firing rate®!, was calcu- 
lated, and the peak between 80-1,000 ms was found. The width of the centre 
peak indicates the width of bursts, and multiple side lobes with regular inter- 
vals indicate rhythmic bursting. (3) Spectrum of the spike train. Rhythmicity 
of the single-unit activity was also quantified in the frequency domain using 
multi-taper spectral analysis of spike trains treated as point processes”. We used 
the Chronux software to calculate the spectrum for the spike trains**™. First, bouts 
of singing were segmented into non-overlapping analysis windows of 1.5 s long, 
and then the spectrum for each window was calculated using multi-taper spectral 
analysis with time-bandwidth product NW = 3/2 and the number of tapers K=2. 
To obtain the mean spectrum for a given neuron, spectra calculated from all the 
analysis windows were averaged. Finally, we found the peak in the mean spec- 
trum within the range 2-15 Hz. (4) Cepstrum of the spike train. HVC projection 
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neurons typically exhibited brief rhythmic bursts with precise inter-burst intervals 
(Fig. 1b, c). Thus, the spectrum of the spike train tended to have peaks at multiples 
of the fundamental frequency. To represent these burst trains that have regular 
intervals in a more compact way, we calculated the cepstrum (a technique com- 
monly used in speech processing to extract the period of glottal pulses) of the spike 
train, defined as the inverse Fourier transform of the log spectrum®, and found 
the peak in the cepstrum between 80-1,000 ms. 

To assess the significance of the peaks in these four measures, we compared 
the distribution of peak amplitude obtained from the real data with that of the 
surrogate data obtained by shuffling the bursts times. For this shuffling procedure, 
we first identified all the bursts during a bout of singing as described above. We 
then randomly placed bursts sequentially in an interval that has the same duration 
as the song bout; when spikes from two bursts were closer than 30 ms, we repeated 
the random placement until they were spaced by more than 30 ms. Note that this 
randomization procedure only shuffles the burst times and preserves both the 
number of bursts and the ISIs within bursts. Then, all four metrics listed above 
were calculated by applying the same method to these surrogate spike trains. This 
shuffling was repeated (1,000 times for the IBI and autocorrelation, 100 times 
for the spectrum and cepstrum) and the P values of the peak were calculated by 
analysing the frequency at which the peaks from the surrogate spike trains were 
larger than the peak obtained from real data. A neuron was considered to exhibit 
‘rhythmic bursting if it had significant peaks in at least two of the four metrics. 
The period of the rhythm was defined as the location of the largest peak of spike- 
train autocorrelation between 80-1,000 ms. 

Quantification of the probabilistic neural activity during the protosyllable 
stage (Extended Data Fig. 2p). Although many HVC projection neurons recorded 
in the juvenile bird exhibited rhythmic bursts, these bursts did not occur reli- 
ably on every cycle of the rhythm, but instead participated probabilistically 
(Fig. 2a). To quantify the degree of participation, we first extracted the proto- 
syllables based on syllable duration (see earlier section Syllable classification 
and labelling) and examined the fraction of protosyllables in which at least one 
spike occurred (time-window from 30 ms before protosyllable onset to 10 ms 
after protosyllable offset). The fraction of protosyllables in which the neuron 
was active was obtained for all the HVC projection neurons recorded during 
the protosyllable stage that showed a significant rhythmic bursting (Extended 
Data Fig. 2p). 

Analysis of simultaneously recorded pairs of neurons (Extended Data 
Fig. 2q, r). To test whether probabilistic bursting of neurons in the protosyllable 
stage is coordinated across many neurons, we analysed the correlation between 
pairs of simultaneously recorded neurons (Fig. 2a, bottom). This analysis was 
restricted to pairs of neurons that were rhythmically bursting (n= 11 pairs, 3 birds). 
Bursting activity of each neuron was converted to a binary string corresponding 
to its participation in each protosyllable (for the definition of protosyllables, see 
earlier section Syllable classification and labelling). The activity of a neuron was 
assigned a ‘1’ for a protosyllable if the neuron exhibited activity in a time-window 
from 30 ms before protosyllable onset to 10 ms after protosyllable offset, and ‘0’ if 
it did not. Only activity during protosyllables was analysed to avoid including the 
highly variable subsong syllables, which are likely generated by circuits outside 
HVC?”**, For simultaneously recorded pairs of neurons, this procedure resulted 
in two binary strings corresponding to the protosyllable-related activity of each 
neuron. We then calculated the coefficient of determination r by taking the square 
of the Pearson's correlation coefficient r between the two binary strings. The distri- 
bution of coefficient of determination is shown in Extended Data Fig. 2q (median 
7° =0.072, 1] pairs). 

We also carried out a mutual information analysis to quantify whether the 
activity of one neuron was predictive of the set of protosyllables for which the 
other neuron was active. Using the same binary representation described above, 
we calculated the joint probability distribution describing the four possible states 
of activity (neither neuron spikes, neuron A spikes, neuron B spikes, both neu- 
rons spike). The mutual information was computed from this joint distribution 
(Extended Data Fig. 2r, median mutual information = 0.056 bits, 11 pairs). 

Both the correlation and mutual information were extremely low, suggesting 
that different projection neurons participated on relatively independent sets of 
protosyllables. These findings suggest that individual projection neurons partic- 
ipate probabilistically and largely independently in an ongoing rhythmic proto- 
sequence within HVC. 

Analysis of coverage by HVC projection neuron bursts (Extended Data 
Fig. 2s, t). We wondered whether projection neuron bursts effectively span the 
entire duration of juvenile song syllables, or whether bursts are highly localized to 
specific times, leaving other times in the syllable unrepresented”. It is clear from 
the syllable aligned raster plots that some syllables were completely covered by 
bursts (for example, Fig. 3h, syllable ‘C’), while other syllables showed some gaps 
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in the burst coverage (for example, Fig. 4i, syllable ‘A’). To further quantify this 
aspect of the HVC representation during singing, we analysed the fraction of time 
within the syllables of juvenile birds that were ‘covered’ by the recorded projection 
neurons bursts (‘covered fraction’). This analysis was restricted to syllables with 
more than 10 associated bursts. 

We first determined the region of the song syllable covered by each HVC 
projection neuron burst. We generated a histogram of syllable -onset or -offset 
aligned spike times recorded from a single neuron over every recorded rendition 
of the song syllable. Initial identification of candidate burst events was determined 
by smoothing the histogram (9 ms sliding square window, 1 ms steps), and setting 
a threshold to define a window in which to analyse burst spikes (2 Hz for protosyl- 
lable stage birds; 10 Hz threshold for older juveniles). To eliminate low-probability 
spike events, we only considered bursts for which spiking activity (at least one 
spike) occurred in the candidate burst window on at least 25% of the renditions 
for that syllable. Bursts were included only if they occurred between 30 ms before 
syllable onset and 10 ms after syllable offset. 

For candidate bursts that met these criteria, all spikes occurring in the burst 
window were considered as contributing to that burst. Based on earlier meas- 
urements of postsynaptic currents and potentials of HVC and RA neurons”, 
each HVC spike in the burst window was conservatively assumed to exert a 
postsynaptic effect lasting no more than 5 ms. Thus, each spike in the data set 
was replaced with a 5 ms postsynaptic square pulse (beginning at the spike time). 
We considered a region of the syllable to be ‘covered’ by this burst if at least three 
of these post-synaptic pulses overlapped at that time within the burst, across 
renditions of the syllable. This procedure yielded a small ‘patchy of time covered 
by the burst. The patches associated with each different neuron were combined 
with a logical ‘OR operation to determine the total coverage time of the sylla- 
ble (again in a window from 30 ms before syllable onset to 10 ms after syllable 
offset). The covered time was divided by the duration of the syllable window to 
determine the covered fraction. Only syllables that had more than 10 neurons 
bursting within the syllable window were analysed. This criterion excluded syll- 
ables from bird 3 (shown in Extended Data Fig. 8), from which relatively few 
neurons were recorded. 

While most syllables had nearly complete burst coverage (>90%), one syllable 
had coverage of only 73% (Extended Data Fig. 2t), which could potentially be 
due to the relatively smaller number of neurons recorded in this bird. Thus, we 
asked whether the measured coverage is consistent with sparse sampling of the 
recorded bursts from a large number of uniformly placed bursts. To simulate 
this, we calculated the covered fraction for 1,000 surrogate data sets in which 
the ‘covered patches’ for each burst were randomly shuffled within the syllable. 
A random offset was added to the time of each patch, and a circular shift was 
used, allowing the patches to wrap around the edges of the syllable window. The 
distribution of covered fractions was determined over all shuffled surrogate data 
sets, and the 2.5-97.5 percentiles (95% confidence interval) of this distribution 
were determined (shown as vertical grey bars in Extended Data Fig. 2t). For all 
syllables, the observed covered fraction was consistent with that expected for 
random sampling from a uniform underlying distribution of burst times. 
Shared and specific neurons. To examine whether a given HVC projection neu- 
ron was active during multiple syllable types (‘shared’ neuron) or was active only 
during a specific syllable type (‘specific neuron), we first constructed a sylla- 
ble-onset aligned histogram (1 ms bin, smoothed over 20 bins) for each syllable 
type. Spike times were linearly time warped to the mean duration of that syllable 
to reduce the trial-to-trial variability in the spike timing associated with the varia- 
tion in the syllable duration. Next, we found the peak in the firing rate histogram 
in the interval between 30 ms before syllable onset and 10 ms after syllable offset. 
We visually inspected the syllable-aligned histograms, and adjusted the interval 
if necessary to avoid the same burst being detected twice (that is, being associated 
with an offset of one syllable and an onset of the next syllable). The significance 
of this peak was determined by comparing it with the peak size obtained from the 
shuffled histogram using the same method described earlier (in Syllable-related 
neural activity section). 

We defined ‘shared’ and ‘specific’ neurons in the context of a particular syllable 
differentiation process (for example, 3 and y from bird 1 in Fig. 3; « and 8 from 
bird 2 in Fig. 4; B and D from bird 1 in Extended Data Fig. 7). ‘Specific neurons 
were defined as neurons that had a significant peak in the syllable-aligned histo- 
gram for only one syllable type, whereas ‘shared’ neurons were defined as neurons 
that had significant peaks for both syllable types. We took a conservative approach 
and only considered a neuron to be shared if the peak was significant for both 
syllable types. However, some neurons classified as specific had weak activity for 
the other syllable that did not reach significance (for example, Extended Data 
Fig. 6f). In other words, we believe this method likely underestimated the fraction 
of neurons with shared activity. 


Our method likely underestimated the incidence of shared neurons for another 

reason as well. Specifically, we defined shared and specific neurons in the context 
of a particular pair of syllables undergoing differentiation. For example, in a bird 
that exhibited hierarchical differentiation (bird 1; Extended Data Fig. 7), we saw 
examples of neurons that were B-specific when considering B-C differentiation 
but shared when considering B-D differentiation. Thus, when considering all 
the syllables in the motif, our definition of shared and specific neuron based on 
syllable pairs will underestimate the fraction of shared neurons and overestimate 
the fraction of specific neurons. 
Quantification of the similarity of latencies in shared neurons (Extended 
Data Fig. 4a-d and Extended Data Fig. 8i, j). To test whether shared neurons 
were active at similar latencies for multiple syllable types, we first calculated the 
latency of the peak in the syllable onset- or offset-aligned histograms. We then 
plotted the latency of the peak for one syllable against that of another syllable 
(Extended Data Fig. 4a—-d). When a shared neuron was active for three or more 
syllables, two syllables associated with two highest firing rates were chosen. 
To quantify whether shared neurons were active at similar latencies for two 
syllable types, we calculated the Pearson’s correlation coefficient r between the 
two latencies across shared neurons, and the P value under the null hypothesis 
that r=0. 

For the bird whose song was segmented based on the phase of the rhythm 
(bird 3, Extended Data Fig. 8), we asked whether bursts of shared neurons 
during different syllables occurred at similar phases of the rhythm. To quantify 
the phase of the neural activity, we first detected the burst times during singing, 
and for each burst, we assigned an instantaneous phase extracted from the song 
using the Hilbert transform (see the section on phase segmentation above). 
Then, the mean phase of all the bursts produced during a particular syllable 
type was calculated (;, where i= 1, 2, ..., 5 indicates syllables). Finally, the two 
syllable types were chosen for which the neuron participated most reli- 
ably, and the difference between the mean phases for these two syllables 
(JAy| =|Ym — Yn|, where m and n are syllable indices) was obtained (Extended 
Data Fig. 8i). We tested the significance of this value by comparing the value of 
|Ay| against that obtained from the shuffled data where the pairing of phases 
were randomized across all shared neurons (Extended Data Fig. 8; 1,000 shuf- 
fles). P values were obtained by analysing the frequency with which |Ay| of 
surrogate data was smaller than that of the real data, and P < 0.05 was consid- 
ered significant. 

Quantification of the activity level difference in shared neurons (Extended 
Data Fig. 4i, j). To quantify the difference in the activity level for multiple syllable 
types in the shared neurons, we calculated the ‘bias’ defined as follows: 
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where 7; is the peak firing rate in the syllable-aligned histogram for syllable i. Bias 
of 0 indicates equal activity level for both syllable types, whereas bias of 1 indicates 
exclusive activity for only one of the syllable types (Extended Data Fig. 4j). 
Analysis of acoustic features associated with bursts of shared neurons 
(Extended Data Fig. 5). We wondered if the bursts of shared neurons were 
associated with different acoustic signals in the shared syllables at the time of 
the bursts. (An alternative possibility is that shared neurons burst only at times 
within the emerging syllable types when the acoustic signals are identical.) 
An example of a neuron analysed here is shown in Extended Data Fig. 5a 
(from the same data shown in Fig. 3e). This neuron bursts just after the onset 
of both syllables 8 and -. We analysed the acoustic differences in a 0-50 ms 
analysis window after the burst time, but were most interested in acoustic dif- 
ferences in a narrower premotor window (10-40 ms), as this corresponds to 
the premotor latency for which one expects HVC neurons to exert an effect on 
vocal output??*8. 

For each neuron analysed, all syllables in which the neuron generated a burst 
were identified. The analysis was carried out for every syllable rendition on 
which the neuron burst, and was restricted to only those syllables. Syllables had 
previously been labelled by type (that is, 8 and +). We first directly visualized 
the spectral differences between the two syllable types using a sparse contour 
representation", which is suitable for constructing an ‘average’ spectrogram. 
The analysis was carried out on the sound signal extracted from a 50 ms window 
after each burst. In many cases, this spectral representation revealed consistent 
differences between the different syllable types in this analysis window (Extended 
Data Fig. 5b, c). 

One complication is that some of the shared neurons burst before syllable 
onsets or immediately before syllable offsets such that the 10-40 ms window 
after the bursts was obscured by silent gaps (9 of 24 HVCra neurons and 59 of 
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120 HVCx neurons were obscured). These neurons were excluded from the anal- 
ysis of acoustic difference. 

We further quantified differences in the acoustic signals by extracting time 
varying acoustic and spectral features in a window 0-50 ms after burst time (see 
subsection Definition of bursts). We used 8 acoustic features previously estab- 
lished to analyse birdsongs (Wiener entropy, spectral centre of gravity, spectral 
width, pitch, pitch goodness, sound amplitude, amplitude modulation, frequency 
modulation)*!°°. The 8-dimensional vector of features was calculated in 1 ms steps 
over the 50 ms analysis window (Extended Data Fig. 5d, e). 

Because each syllable was labelled, we could determine if the feature trajecto- 
ries were significantly different for syllables labelled 8 and those labelled y, and 
make this determination at every time step in the analysis window (Extended 
Data Fig. 5d, e; s.e.m. indicated by shaded region around mean trajectory). 
Rather than quantify the difference in these trajectories one feature at a time, 
we used Fisher’s discriminant analysis’! to project the 8-dimensional acoustic 
feature vector onto a single dimension that gives maximum separability between 
the two syllable types. The projected direction is determined independently at 
each time point, and the feature vectors of all syllable renditions are projected, at 
each time point, to yield a distribution of projected samples. For most neurons, 
the different syllable types produce visibly different distributions of projected 
samples (Extended Data Fig. 5f) indicating distinct acoustic structure. The sepa- 
rability of the distributions (in one dimension) of projected samples for different 
syllable types was quantified using the d-prime metric (d’), corresponding to 
the distance between the means of the distributions, normalized by the pooled 
variance” 
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Because the features evolve in time, this analysis is carried out independently 

at each 1 ms step in the 50 ms analysis window, and the d’ was plotted as a func- 
tion of time (Extended Data Fig. 5g). Statistical significance of the d’ trajectory 
was assessed by randomizing the syllable labels and rerunning the d’ analysis on 
shuffled data sets (N= 1,000 shuffles). For each randomization, the peak value 
of d’ in 10-40 ms premotor window was recorded; significance threshold was set 
as the 95 percentile of the distribution of these peak values. A shared neuron was 
determined to have significant acoustic difference between the shared syllables 
only if the d’ trajectory remained above this significance threshold for the entire 
premotor window of 10-40 ms after the burst. Note that, in the simulated data, 
none of the 1,000 surrogate runs generated a d’ trajectory that met this stringent 
criterion. 
Statistics. Results are expressed as the mean +s.d. or s.e.m. as indicated. For \? 
tests, if the contingency table included a cell that had an expected frequency less 
than 5, Fisher’s exact test was used”. All tests were two-sided, and P< 0.05 was 
considered significant. Bonferroni correction was used to account for multiple 
comparisons. 

Figure 1f. The statistical significance of developmental changes in the fraction 
of HVC neurons that were syllable-aligned was assessed in two different ways: 
(1) Each stage was compared with the adult stage using the 7 test followed by a 
post-hoc pairwise test. (2) To quantify the developmental trend in the fraction of 
syllable-locked neurons, we calculated Pearson's correlation coefficient r between 
the binary value for each neuron (0, unlocked; 1, locked) and song stage (subsong: 
1, protosyllable: 2, multi-syllables: 3, motif: 4, adult: 5). The P value was calculated 
under the null hypothesis that r= 0. The significance of the developmental trend 
for rhythmic bursting was calculated similarly. Similar results were obtained for 
correlation between these metrics and the age at which each neuron was recorded, 
rather than song stage. 

Figure 1g. The statistical significance of developmental changes in the period 
of the HVC rhythm was also assessed in two different ways: (1) Each song stage 
was compared with the adult stage using the Kruskal-Wallis test followed by a 
post-hoc pairwise test. (2) To quantify the developmental trend in the period of 
the HVC rhythm, we calculated Pearson’s correlation coefficient r between burst 
period and song stage. Similar results were obtained for correlation between burst 
period and the age at which each neuron was recorded. 

Figure 2c. The Wilcoxon rank-sum test was used to test whether the median 
of the syllable-onset aligned latency distribution was different between subsong 
and protosyllable stages. 

Figures 3g, h and 4h, i. To test whether the fraction shared neurons differed 
between early and late stages of syllable differentiation, we used the y? test 
on a 2 x 2 contingency table (shared/specific, early/late). Regarding across 
all birds, to calculate whether the fraction of shared neurons differed between 
early and late stages of syllable differentiation over all birds (n=5 syllable pairs 
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in 3 birds), we used the Cochran—Mantel-Haenszel test for repeated tests of 
independence’? 

Extended Data Fig. 1a. To quantify the relation between song stage and age, 
we calculated Spearman's rank correlation coefficient p and the P value under the 
null hypothesis that p=0. 

Extended Data Fig. 1c. We computed the statistical significance of develop- 
mental changes in burst width (top) and firing rate during bursts (bottom) by 
using the Kruskal-Wallis test followed by a post-hoc pairwise test to compare 
each stage with the adult stage. 

Extended Data Fig. 2m-o. To test whether fraction of syllable-locked neu- 
rons (Extended Data Fig. 2m), fraction of rhythmic neurons (Extended Data 
Fig. 2n), and period of HVC rhythm (Extended Data Fig. 20) significantly differed 
between HVCga and HVCx, we used ? test for all the pairwise comparisons with 
Bonferroni correction for multiple comparisons. 

Extended Data Fig. 4a-d. To calculate the relation between latencies of bursts 
associated with shared neurons, we calculated the Pearson’s correlation coefficient 
r together with the P value under the null hypothesis that r=0. 

Extended Data Fig. 5m, n. To test whether the mean d’ metric was different 
between HVCaa and HVCx, we used the Wilcoxon rank-sum test. Only neu- 
rons with d’ trajectories that were significant (continuously from 10-40 ms) were 
included in this comparison. 

Neural model of chain formation and splitting. Code used to simulate the 
model is available as Supplementary Information. To illustrate a potential mech- 
anism of chain splitting, we chose to implement the model as simply as possible. 
We modelled neurons as binary units and simulated their activity in discrete 
time steps*4; at each time step (10 ms), the ith neuron either bursts (x;= 1) or 
is silent (x;=0). 

Network architecture. A network of 100 binary neurons is recurrently connected 
in an all-to-all manner, with Wj representing the synaptic strength from presynap- 
tic neuron j to postsynaptic neuron i. Self-excitation is prevented by setting Wy=0 
for all iat all times“. During learning, the strength of each synapse is constrained 
to be within the interval [0, max], while the total incoming and outgoing weights 
of each neuron are both constrained by the “soft bound” Wyax=m™* Wmax Where m 
represents a target number of saturated synapses per neuron” (see section Synaptic 
plasticity rule for details). Note that Wmax represents a hard maximum weight of 
each individual synapse, while Winax represents a soft maximum total synaptic 
input or output of any one neuron. Synaptic weights are initialized with random 
uniform distribution such that each neuron receives, on average, its maximum 
allowable total input, Wax. 

Network dynamics. The activity of each neuron in the network was determined 
in two steps: calculating the net feedforward input that comes from the previous 
time step; then determining whether that is enough to overcome the recurrent 
inhibition in the current time step. 

First, the net feedforward input to the ith neuron at time step t, A?“ (t), was 
calculated by summing the excitation, feedforward inhibition, neural adaptation, 
and external inputs: 

Ap (t) = [AP (t) — A’# (t) —A}“P(t) + B(t) — 8], 
where [z], indicates a rectification (equal to z if z>0 and 0 otherwise). 
AR (t)= yj W(t — ) is the excitatory input from network activity on the 
previous time step. Al (t) = 33, x; (t— 1) isa global feedforward inhibitory 
input“, where (3 sets the strength Of this feedforward inhibition. AstPE (f) = ay, 
is an adaptation term“ where a is the strength of adaptation, and y; is a low-pass 
filtered record of recent activity in x; with time constant Tadapt = 40 ms; that is 
Tadapt aa = —y,+%;5 B,(t) is the external input to neuron i at time tf. For seed 
neurons, this term consists of training inputs (see section on Seed neurons). 
For non-seed neurons, it consists of random inputs with probability p;,=0.01 in 
each time step and size Winax/ 10. Finally, 6; is a threshold term used to reduce the 
excitability of seed neurons, making them less responsive to recurrent input than 
are other neurons in the network. For seed neurons, 0;= 10 and for non-seed 
neurons, 6;= 0. Including this term improves robustness of the training procedure 
by eliminating occasional situations in which seed neuron activity may be dom- 
inated by recurrent rather than external inputs. In these cases, external inputs may 
fail to exert proper control of network activity. 

Second, we determined whether the ith neuron will burst or not at time step f 
by examining whether the net feedforward input, A;*'(t), exceeds the recurrent 
inhibition, A(t). We implemented recurrent inhibition by estimating the total 
input to the network at time t: 


AL ree 
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and feeding it back to all the neurons. Parameter 7 sets the strength of the recur- 
rent inhibition. We assume that this recurrent inhibition operates on a fast time 
scale*® (that is, faster than the duration of a burst). Thus, the final output of the 
ith neuron at time t becomes: 


x(t) = [AP (t) — A (t)] 


where OQ [z] is the Heaviside step function (equal to 1 ifz>0 and 0 otherwise). To 
induce splitting, - was gradually stepped up to pit following a sigmoid with time 
constant 7, and inflection point to: 


Veplit 
WO ea 
Seed neurons. A subset of neurons was designated as seed neurons, which received 
external training inputs used to shape network activity during learning***. The 
external training inputs activate seed neurons at syllable onsets, reflecting the 
observed onset-related bursts of HVC neurons during the subsong stage (Fig. 1a). 
The pattern of these inputs was adjusted in different stages of learning, and each 
strategy of syllable learning was implemented by different patterns of seed neuron 
training inputs. 

Alternating differentiation (Fig. 5a-e). Ten neurons were designated as seed neurons 
and received strong external input (Wax) to drive network activity. In the subsong 
stage, seed neurons were driven (by external inputs) synchronously and randomly 
with probability 0.1 in each time step corresponding to the random occurrence 
of syllable onsets in subsong’”*". This was done only to visualize network activ- 
ity; no learning was implemented at the subsong stage. During the protosyllable 
stage, seed neurons were driven synchronously and rhythmically with a period 
T=100ms. The protosyllable stage consisted of 500 iterations of 10 pulses each. 
To initiate chain splitting, the seed neurons were divided into two groups and 
each group was driven on alternate cycles. The splitting stage consisted of 2,000 
iterations of 5 pulses in each group of seed neurons (1 s total per iteration, as in 
the protosyllable stage). 

Motif strategy (Extended Data Fig. 10e-h). This was implemented in a similar man- 
ner as alternating differentiation, except that 9 seed neurons were used, and for the 
splitting stage, seed neurons were divided into 3 groups of 3 neurons, each driven 
on every third cycle. 

Bout-onset differentiation (Extended Data Fig. 10a-d). Seed neurons were divided 
into two groups: 5 bout-onset seed neurons and 5 protosyllable seed neurons. 
At all learning stages, external inputs were organized into bouts consisting of four 
separate input pulses, and bout-onset seed neurons were driven at the beginning 
of each bout. Then, 30 ms later, protosyllable seed neurons were driven three times 
with an interval of T= 100 ms. In the protosyllable stage, inputs to all seed neu- 
rons were of strength W,,,qx. In the splitting stage, the input to protosyllable seed 
neurons was decreased to Wingx/10. This allowed neurons in the bout-onset chain 
to suppress, through fast recurrent inhibition, the activity of protosyllable seed 
neurons during bout-onset syllables. 

Each iteration of the simulation was 5s long, consisting of 10 bouts, described 

directly above, with random inter-bout intervals. The protosyllable stage consisted 
of 100 iterations, and the splitting stage consisted of 500 iterations. 
Bout-onset syllable formation (Extended Data Fig. 10i-k). Input to seed neurons 
was set high (2.5* Wyax), and maintained at this high level throughout develop- 
ment. This prevented protosyllable seed neurons from being inhibited by neurons 
in the bout-onset chain. Furthermore, strong external input to the protosyllable 
seed neurons terminated activity in the bout-onset chain through fast recurrent 
inhibition, thus preventing further growth of the bout-onset chain, as occurs in 
bout-onset differentiation. 

As in bout-onset differentiation, each iteration of the simulation was 5s long, 
consisting of 10 bouts with random inter-bout intervals. The protosyllable stage 
consisted of 100 iterations, and the splitting stage consisted of 500 iterations. 
Synaptic plasticity rules. As in previous models‘*“, we hypothesized two plasticity 
rules in our model: Hebbian spike-timing dependent plasticity (STDP) to drive 
sequence formation’*”, and heterosynaptic long term depression (hLTD) to intro- 
duce competition between synapses of a given neuron**“. STDP is governed by the 
antisymmetric plasticity rule with a short temporal window (one burst duration): 


ASP (t) = [x(t)a(t— 1) — x(t 1)x(t)] 


where the constant 77 sets the learning rate. hLTD limits the total strength of weights 
for neuron i, and the summed weight limit rule for incoming weights is given by: 


ApeT(t) =m) (Welt 1) + ANPP (t)) — Whar 


k + 


and for outgoing weights from neuron j: 


AG (t) = 01D (Walt — 1) + AGPP(t)) — Winax 
k + 
At each time step, total change in synapse weight is given by the combination 
of STDP and hLTD: 


AW;(t) = Ag? (t) — eA (t) — eA 7?(t) 


where € sets the relative strength of hLTD. 

Model parameters: subsong (Fig. 5a). In our implementation of the subsong stage, 
there was no learning. Subsong model parameters were: 9=0.115, a=30, 7=0, 
e=0,7y=0.01. 

Model parameters: alternating differentiation (Fig. 5b-d). After subsong, 
learning progressed in two stages: the protosyllable stage and the splitting stage. 
Parameters that remained constant over development were: B=0.115, a=30, 
n= 0.025, ¢ =0.2. To induce chain splitting, Wmax, the maximum allowed strength 
of any synapse, was increased from 1 to 2, m was decreased from 10 to 5, and y 
was increased from 0.01 to 0.18 following a sigmoid with time constant r= 200 
iterations and inflection point fp = 500 iterations into the splitting stage. No change 
in parameters occurred before the chain-splitting stage. 

Model parameters: bout-onset differentiation (Extended Data Fig. 10a-d). 
Parameters that remained constant over development were: 3= 0.13, a= 30, 
n=0.05, ¢=0.14. To induce chain splitting, Wmax was increased from 1 to 2, m 
was decreased from 5 to 2.5, and y was increased from 0.01 to 0.04 following 
a sigmoid with time constant T,= 200 iterations and inflection point ty = 250 
iterations into the splitting stage. 

Model parameters: motif strategy (Extended Data Fig. 10e-h). Parameters that 
remained constant over development were: 3= 0.115, a= 30, 7 =0.025, ¢=0.2. 
To induce chain splitting, Wmax was increased from 1 to 2, m was decreased 
from 9 to 3, and y was increased from 0.01 to 0.18 following a sigmoid with 
time constant T= 200 iterations and inflection point ty = 500 iterations into 
the splitting stage. 

Model parameters: formation of a new syllable at bout onset (Extended Data 
Fig. 10i-k). Parameters that remained constant over development were: 3=0.13, 
a= 30, 7 =0.05, ¢=0.15. To induce chain splitting, Wmax was increased from 1 to 
2, m was decreased from 5 to 2.5, and ¥ was increased from 0.01 to 0.05 following 
a sigmoid with time constant 7, = 200 iterations and inflection point ty = 250 iter- 
ations into the splitting stage. 

Shared and specific neurons. Neurons were classified as participating in a syllable 
type if the syllable onset-aligned histogram exhibited a peak that passed a threshold 
criterion. The criteria were chosen to include neurons where the histogram peak 
exceeded 90% of surrogate histogram peaks. Surrogate histograms were generated 
by placing one burst at a random latency in each syllable. (For example, in the pro- 
tosyllable stage, the above criterion was found to be equivalent to having 5 bursts at 
the same latency in a bout of 10 protosyllables.) During the splitting phase, neurons 
were classified as shared if they participated in both syllable types, and specific if 
they participated in only one syllable type. 

Visualizing network activity. We visualized network activity in two ways: network 
diagrams, and raster plots of population activity (for example, Fig. 5a—d top and 
bottom panels, respectively). In both cases, we only included neurons that partic- 
ipated in at least one of the syllable types (see earlier section Shared and specific 
neurons for participation criteria). 

Network diagrams. Neurons are sorted along the x axis based on their relative 
latencies. Neurons are sorted along the y axis based on the relative strength of their 
synaptic input from specific neurons (or seed neurons) of each type (red or blue). 
Lines between neurons correspond to feedforward synaptic weights, and darker 
lines indicate stronger synaptic weights. For clarity of plotting, only the strongest 
six outgoing and strongest nine incoming weights are plotted for each neuron. 
Population raster plots. Neurons are sorted from top to bottom according to their 
latency. Groups of seed neurons are indicated by magenta arrows. Shared neurons 
are plotted at the top and specific neurons are plotted below. As for network dia- 
grams, neurons that did not reliably participate in at least one syllable type were 
excluded. 

Further details for Fig. 5a-d. Panels show network diagrams and raster plots at four 
different stages. Figure 5a shows subsong stage (before learning), Fig. 5b shows 
end of protosyllable stage (iteration 500), Fig. 5c shows early chain splitting stage 
(iteration 992), Fig. 5d shows late chain-splitting stage (iteration 2,500). 

Further details for Extended Data Fig. 10a-d. Extended Data Fig. 10a shows early 
protosyllable stage (iteration 5), Extended Data Fig. 10b shows late protosyllable 
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stage (iteration 100), Extended Data Fig. 10c shows early chain splitting stage (iter- 
ation 130), Extended Data Fig. 10d shows late chain splitting stage (iteration 600). 


Code availability. Code used to simulate the model is available as Supplementary 


Information. 
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Extended Data Figure 1 | Bursting and syllable-locked activity in HVC 
projection neurons of juvenile birds. a, Range of bird ages at which 

songs were classified at different developmental stages (Spearman's rank 
correlation between age and stage p = 0.61; red line indicates the median, 
box indicates the 25-75 percentile, and whiskers indicate 10-90 percentile; 
n= 12, 13, 18 and 6 birds, respectively; n = 39, 135, 565 and 378 neurons, 
respectively). b, Interspike-interval (ISI) distributions (mean + s.e.m.) of 
HVC projection neurons that exhibited spiking during singing, at three 
stages of vocal development (n = 38, 130, 922 neurons). ISI distributions 
computed with logarithmic binning show bimodal structure: the peak 
around 3-5 ms indicates inter-spike intervals within bursts, and a broader 
peak around 100-400 ms indicates intervals between bursts (dashed line 
indicates the 30 ms threshold used for defining a burst; dotted line indicates 
peak). Note the refractory period below 1 ms. c, Burst width 

(top) and firing rate during bursts (bottom) as a function of 
developmental stage (median + quartiles; n = 39, 135, 565, 378 and 32 
neurons, respectively; ** P< 0.01, *** P< 0.001 post-hoc comparison with 


-0.1 
Time from syllable onset (s) 


Time from « onset (s) 


(0) 0.1 02 03 


adult stage). d-i, Syllable-onset-aligned raster plots and histograms for 
neurons recorded during the subsong stage. Syllables are sorted from bottom 
to top by increasing syllable duration (blue lines indicate syllable offset). 

d, Neuron that did not exhibit significant locking to subsong syllable onsets 
(RA-projecting neuron, HVCpa; 50 dph; bird 7). e, Another neuron in the 
same bird (same neuron as in Fig. la; HVCpa; 51 dph). f, g, Two projection 
neurons recorded in a different subsong bird (both X-projecting neurons, 
HVCx; 47 and 48 dph, respectively; bird 9). Note different latencies of 
bursting. h, i, Two projection neurons recorded in a different subsong bird 
(both HVCx; 47 and 44 dph, respectively; bird 10). j, k, Syllable-onset- 
aligned raster plots and histograms showing strong locking to protosyllables 
(bird 2). j, For the same neuron as in Fig. 1b (HVCaa; 62 dph). k, For another 
neuron (HVCaa; 65 dph). 1, m, Two neurons recorded in the motif stage 
(bird 8). 1, Neuron locked just after syllable onset (HVCx neuron; 61 dph). 
m, Same neuron as in Fig. 1c (HVCRa; 68 dph) showing locking late in the 
song syllable. n, Population raster of 14 neurons, aligned to protosyllable 
onsets (56-59 dph; bird 1). 
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Extended Data Figure 2 | See next page for caption. 
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Extended Data Figure 2 | Further analysis and examples of HVC 
projection neuron activity. a-d, Examples of HVC projection neurons 
showing rhythmic activity during non-rhythmic song. a, Bird 2, HVCra 
neuron, 57 dph. b, Bird 12, HVCx, 53 dph. c, Bird 12, HVCga, 57 dph. 

d, Syllable onset-aligned raster plot for neuron shown in c. Syllables are 
sorted in order of increasing duration (bottom to top; blue line indicates 
syllable offset). Also shown (top) is the onset-aligned spike histogram. 
Note multiple rhythmic bursts during long syllables. Scale bars: panels 
a-c, 1 mV, 100 ms. e-I, Bout-related activity of HVC projection neurons. 
e, Bout-onset neuron (HVCx; 44 dph; bird 11). f, Bout-onset aligned 
histogram and raster plot for the neuron shown in panel e. g, Bout-onset 
aligned histogram and raster plot for the neuron shown in Fig. 1d. 

h, Distribution of pre-bout-onset latencies for all bout-onset neurons 
(n= 187 neurons, 32 birds). i, Bout-offset neuron (HVCx; 61 dph; bird 1). 
j, Bout-offset aligned histogram and raster plot for the neuron shown in 
panel i. k, Bout-offset aligned histogram and raster plot for the neuron 
shown in Fig. le. 1, Distribution of post-bout-offset latencies for all 
bout-offset neurons (n = 149 neurons, 32 birds). Vertical scale bars in 
panels e and i, 0.5 mV. m-o, Developmental progression of HVC activity 
analysed separately for HVCa4 and HVC neurons. m, Fraction of neurons 
temporally locked to syllables (mean + s.e.m.; HVCra: 9, 22, 83, 54 and 10 
neurons analysed at each stage, respectively; HVCx: 27, 91, 376, 244 and 
22 neurons analysed at each stage, respectively). n, Fraction of neurons 
that exhibited rhythmic bursts (HVCga: 9, 22, 83, 54 and 10 neurons, 
respectively; HVCx: 27, 91, 376, 244 and 22 neurons, respectively). 

o, Mean period of HVC rhythmicity as a function of song stage (HVCaa: 
0, 16, 50, 41 and 7 neurons, respectively; HVCx: 3, 41, 245, 189, 18 
neurons, respectively). Of the 14 comparisons between HVCra and HVCx 
neurons shown in panels m-o, only the period of HVC rhythm (panel 

o) during the motif stage showed significant difference between the cell 
types (P< 0.05 with Bonferroni correction). p-r, Analysis of probabilistic 
participation in rhythmic activity during protosyllables. p, Distribution 
of the fraction of protosyllables on which spiking occurred (n =70 
neurons). In contrast to the highly reliable bursting of HVC projection 
neurons in adult birds!””’, we found that neurons in the protosyllable 
stage participated probabilistically (mean: 53% of protosyllables; triangle 
symbol). q, Histogram of the coefficient of determination r* for protosyllable 


participation across simultaneously recorded pairs of neurons (median 
r?=0.072; n= 11 pairs; see Methods). r, Histogram of mutual information 
for protosyllable participation across simultaneously recorded pairs of 
neurons (median 0.056 bits; n = 11 pairs; see Methods). s, t, Analysis of 
burst coverage by HVC projection neuron bursts. s, Summary histogram 

of the covered fraction for all analysed syllables (n = 20 syllables, 4 birds). 
Note that 17/20 syllables had a covered fraction higher than 90%. t, Covered 
fraction analysed for 20 syllables for which raster plots are shown in the 
main or Extended Data figures. Vertical grey bars indicate 95% confidence 
interval (2.5-97.5 percentile) of coverage expected for random uniform 
shuffling of the observed bursts (see Methods). Note that for all syllables, 
the observed coverage is within the confidence interval for randomly 
shuffled bursts. These findings suggest that, even for the three syllables 
with coverage less than 90% (indicated with red square symbol), the lower 
coverage was consistent with undersampling due to the smaller number of 
recorded neurons in these birds. Regarding two models of HVC coding: 
our findings bear on several recent models of song representation in HVC. 
One earlier model hypothesizes that HVC bursts provide timing signals 

to drive premotor activity'?***’ and to control the temporal precision of 
learning’*”°. This model implies a continuous, though not necessarily 
uniform, coverage of HVC bursts throughout song, as observed in our data. 
Overall, given the very large number of HVC neurons in each hemisphere*” 
(>10*), our measurements are consistent with a continuous representation 
of timing signals throughout song syllables. Another model of HVC coding 
has emphasized the finding that bursts may occur more often at particular 
times in the song, related to ‘gestures’ in the vocal control parameters”. 
Our finding that bursts are more concentrated around syllable onsets early 
in vocal development suggests that HVC may generate protosyllables as 
primitive gestures that serve as a scaffold on which later song syllables 
develop*’. During development, HVC activity appears to evolve such that, 
as a population, bursts occur more uniformly throughout song syllables 
(Fig. 2c), while the activity of individual neurons becomes sparser and more 
precise. At the same time, one might imagine that vocal gestures become 
more complex and precise as syllables develop into their adult forms. In this 
view, the emergence of sequential activity in HVC may be viewed to drive an 
increasingly complex sequence of gestures. 
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Extended Data Figure 3 | Increase in the period of HVC rhythmicity 
during alternating syllable differentiation. All data are from bird 1. 

a, Paired recording of a shared neuron (top; HVCra) and a 8 -specific 
neuron (bottom; HVCx; 69 dph). b, Paired recording of a shared neuron 
(top; HVCx) and a C-specific neuron (bottom; HVCx; 110 dph). c, Neuron 
switching between shared and specific spiking (HVC x; 63 dph). d, Same 
neuron as in c, switching from specific to shared spiking. e, A different 
neuron switching from shared to specific spiking (HVC,; 68 dph). 

Scale bars in panels a-e, 0.5 mV, 200 ms. f-i, Inter-burst interval (IBI) 
distributions for shared and specific neurons. f, For the neuron in Fig. 3c 


ARTICLE 


j 
° ° 
Protosyllable 7 eS 
°o 4 08 
Shared 4 ° 
° 
@ 
ii 4 ° 
Specific 85 g 
T T T T 1 
0 0.1 0.2 0.3 0.4 0.5 


Location of peak in inter-burst interval (s) 


recorded during protosyllable stage. g, For the shared neuron shown in 
the top panel of Fig. 3f. h, For the 8 -specific neuron shown in Fig. 3d. i, 
For a-y-specific neuron (not shown). j, Population summary of the ‘most- 
probable IBY for the neurons recorded during the protosyllable stage 
(n=9), and during the emergence of syllables 8 and y (62-72 dph; shared 
neurons, 1 = 22; specific neurons, n = 83). Note that shared neurons had 
the same ‘most-probable IBI as neurons recorded during the protosyllable 
stage. Neurons exhibiting an increased burst period by skipping cycles of 
an underlying rhythm were also observed in birds 3, 4 and 6 (see 
Extended Data Figs 8f-h and 9f, h). 
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Extended Data Figure 4 | Analysis of shared neurons: latency and 8 andy (HVC,; 68 dph, bird 1). Also shown is the syllable-onset-aligned 
syllable selectivity. a-d, Latencies of shared neuron bursts, colour-coded raster plot (bottom right) and histogram (top right) showing similar peak 
by cell type: HVCga (red square), HVCx (blue circle), and HVC, (green firing rates for both syllables (low bias; bias = 0.07). f, Spike data (left) 
diamond). a, Neurons in bird 1 shared between syllables 8 and y (from and syllable-onset-aligned raster plot and histogram (right) for a high- 
Fig. 3) recorded during the early (top) and late (bottom) stages of syllable bias shared neuron showing higher peak firing rate for syllable 8 than + 
differentiation. Note strong correlation of burst latencies (early, r=0.91, (bias = 0.63; HVCaga; 68 dph, bird 1). g, Low-bias shared neuron (bias = 0.06; 
P<0.001; late, r=0.87, P=0.005). b, Neurons in bird 1 shared between HVC; 69 dph, bird 2). h, High-bias shared neuron showing higher peak 
syllables D and B (Extended Data Fig. 7) during the early and late stages of firing rate for syllable 3 than « (bias = 0.55; HVC; 68 dph, bird 2). i, Scatter 
syllable differentiation (top, early r> 0.99, P< 0.001; bottom, late r> 0.99, plot of the peak firing rates during two different syllable types, quantified by 
P<0.001). c, Neurons in bird 2 shared between syllables 8 and a (Fig. 4h) the height of the peak in the syllable-aligned spike histogram. Each dot is a 
during the early and late stages (top, early r > 0.99, P< 0.001; bottom, late neuron; shared neurons shown in cyan; neurons near the diagonal have low 


r> 0.99, P<0.001). A shared neuron that had two peaks during the syllable bias. Specific neurons are coloured according to the associated syllable and 
a is shown with an ‘x’ symbol; this point was not included in the calculation appear near the axes. j, Distribution of the bias for shared neurons (cyan) 

of correlation. d, Neurons in bird 4 shared between ‘by’ and ‘d;’ (Extended and specific neurons (magenta). Bias ranged from 0, representing equal 
Data Fig. 91) during early stage (top, r= 0.89, P< 0.001; neurons that burst activity, to 1, representing activity exclusive to either one of the syllables (see 
in the first part of ‘b (‘b;’) are shown with ‘x’ symbol, and were not included — Methods). Specific neurons exhibited a bias tightly clustered around one 

in the calculation of correlation). Neurons in bird 4 shared between syllables (0.96 +0.011, mean +s.d.). In contrast, shared neurons exhibited a broad 


‘cand ‘d,’ (Extended Data Fig. 9n) during early stage (bottom, r=0.98, range of bias (0.28 + 0.22). These observations suggest that individual shared 
P<0.001). Regarding bias: as a population, shared neurons exhibited a neurons can exist in a state intermediate between ‘specific’ and ‘shared’ — 
broad range of selectivity for emerging syllable types—some were equally perhaps reflecting a gradual process by which shared neurons become 

active for both syllable types while others showed higher activity in one specific. Scale bars for panels e-h, 0.5 mV, 100 ms. Insets in panels f and h 


syllable than the other (‘bias’; see Methods). e, Raw spike data (top left) and show zoom of bursts indicated by an asterisk; scale bar: 5 ms. 
instantaneous firing rate (bottom left) for a neuron shared between syllables 
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Extended Data Figure 5 | Analysis of the acoustic differences associated 
with shared neuron bursts. While emerging syllable types gradually 
differentiate acoustically, some parts of different emerging syllable types 
may be acoustically quite similar. We wondered if shared neurons are 

only active at these times within emerging syllables at which no acoustic 
differentiation has yet occurred—that is, at times when the emerging syllable 
types are acoustically identical. To test this possibility, we analysed the 
trajectories of acoustic features of emerging syllable types around the times 
of shared neuron bursts. a, Shared HVCga neuron recorded in bird 1 during 
alternation between emerging syllable types 3 and + (same neuron as Fig. 3e). 
b, c, Average spectrogram (sparse contour representation; see Methods) 
computed for syllables 8 and +, centred on a 50 ms window immediately 
after the burst in each syllable. d, Song amplitude as a function of time for 
syllables 3 (red) and 4 (blue), relative to burst time. Lines show average 
across all syllable renditions on which the neuron was active. Shading 
around lines shows s.e.m. (for this and several other examples, s.e.m. is 

too small to be visible). e, Spectral centre of gravity as a function of time 

for syllables 8 (red) and (blue). f, Distribution of projected samples for 
syllables 3 (red) and + (blue), computed by projecting the 8-dimensional 
vector of spectral features onto a line that yields maximum separability 
between the two syllables. This distribution is computed at each time 

(1 ms steps) in the 50-ms analysis window after burst time. Shown is the 
distribution at t= 25 ms. g, d-prime analysis of separability of projected 
samples for syllables 8 and. The value of d’ is computed as a function 

of time (1 ms steps; red trace). Also shown is the 95% confidence interval 
(grey band) computed from surrogate data sets with randomized labels. 
Dashed horizontal line shows the 95 percentile of the distribution of peak 
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values of d’ in the surrogate data set (identified in the 10-40 ms window). 
h-j, Acoustic analysis for three additional HVC, neurons (analogous 

to panels a—g). k, Plot of d’ trajectories for all shared HVCra neurons. 
Significant d’ values (above the 95 percentile of peak values) are shown 

in red. Non-significant values shown as grey lines. 1, Same as panel k but 
for shared HVCx neurons. m, Population summary of mean d’ (averaged 
over the presumptive premotor window 10-40 ms after burst time). Each 
symbol represents a different shared neuron and each column indicates 

a different syllable pair. Analysis is shown separately for each neuron 

type: HVCga neurons (green circles) and HVCx neurons (blue squares). 
Neurons with no significant acoustic differences are indicated with black 
symbols. n, Cumulative distribution of mean d’ for shared HVCa neurons 
(green; m= 11) and shared HVCx neurons (blue; n= 36). Only neurons 
with significant d’ metric are included in the cumulative. No significant 
difference was observed between neuron types (P= 0.1). Scale bars for 
panels a, h, i, j are 0.5 mV, 100 ms. Summary of properties of HVCra and 
HVCy shared neurons: Shared neurons were found in similar proportion 
across both HVCra and HVCx neurons (19% and 28%, respectively; 
P=0.08; averaged over all developmental stages) and shared neurons of both 
cell types exhibited the property that bursts have similar latencies during 
the shared syllables (Extended Data Fig. 4a—d). As shown above, for both 
neuron types, we observed shared neurons that burst at times where there 
was a significant acoustic difference between the shared syllables. These 
findings suggest that both projection neuron types participate in shared 
neural sequences, and that these shared sequences occur during acoustically 
distinguishable parts of the emerging syllables. 
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Extended Data Figure 6 | Detailed analysis of bout-onset differentiation (An additional syllable 1 emerges at bout onset to form adult syllable C). 


in bird 2. (Same bird as in Fig. 4). a, Song examples throughout song c, Developmental time course of the occurrence probability of different 
development. Panels from top to bottom: first, subsong (49 dph); second, syllable types at bout onsets (mean + s.e.m.). d, Syllable duration 
emergence of protosyllable a from subsong (60 dph); third, appearance distribution showing three non-overlapping peaks (67 dph). Coloured 
of bout-onset element ¢ (63 dph); fourth, fusion of ¢ with first « to form bars indicated syllable duration ranges used for syllable labelling. This 
new syllable 8 (67 dph); fifth and sixth, acoustic differentiation of 8 and separation of durations allowed automatic determination of syllable 

a, and incorporation with 4 into song motif CBA (70, 90 dph); seventh, identity. e, Pitch goodness trajectories of syllables « (red) and 8 (blue) at 
tutor song. b, Schematic of syllable formation (same as Fig. 4a), inferred by _ three stages of vocal development (median + quartiles; n = 100 syllables 
tracking backward in development the adult syllables C, B and A. per day). Black bar, region used to compute data in Fig. 4b. f, Example of 
Early on, protosyllable (labelled «) is produced rhythmically. The first a neuron active during both syllables « and 8 (HVCpa; 69 dph). Note that 
protosyllable in each bout fuses with a brief bout-onset vocal element € to the activity of this neuron during syllable « was weak, and did not quite 
form a new emerging syllable type 8. Both a and 8 undergo subsequent reach our statistical criterion for being a ‘shared’ neuron. 


acoustic differentiation to form adult syllables A and B, respectively. 
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Extended Data Figure 7 | Hierarchical differentiation of syllables. All data 
are from bird 1 (same bird as in Fig. 3). a, Song examples during the 
emergence of syllables B and D from a common precursor syllable 6, 

which had undergone earlier differentiation from a protosyllable «. Panels 
from top to bottom: first (70 dph), After the initial differentiation of the 
protosyllable into 8 and + (at ~62 dph), the bird produced a rhythmic 
alternation of these two syllables, and the alternating sequence was reliably 
preceded at bout onsets by a short vocal element ¢ (¢-8-7y-3-74-6-y...). 

Note that the first repetition of 8 in each bout (labelled D) is acoustically 
identical to later repetitions (labelled B); second (80 dph), the first repetition 
of 6 in the bout (syllable D) undergoes differential acoustic refinement 
compared to later repetitions (syllable B); third, syllable B, C and D, together 
with bout-onset element ¢, crystallize into adult motif EDCB (90 dph), 

that approximately matches the tutor motif (bottom panel). b, Schematic 

of syllable formation. c, Scatter plot of the mean Wiener entropy showing 
differential acoustic refinement of syllables B (orange) and D (green) 
through development (n= 100 syllables of each type per day; horizontal 
jitter added to improve data visibility). d, Wiener entropy trajectory of 
syllables B and D at three stages of vocal development (median + quartiles; 
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n= 100 syllables of each type per day). Black bar indicates region used to 
compute data in panel c. e, Population raster of 60 neurons early in syllable 
differentiation showing shared (top) and specific (bottom) sequences. f, 
Same as e, but for 70 neurons recorded late in differentiation of D and B. 
Evidence for an incomplete splitting of a neural sequence: the pattern of 
shared and specific neurons observed for these syllables is quite similar 

to what would be expected in our model during an early/intermediate 
stage of splitting (Fig. 5c or Extended Data Fig. 10c). Of particular note in 
this bird is the large fraction of shared neurons between B and D that 
remained in the later recordings (panel f), compared to the smaller fraction 
of shared neurons at late stages in syllables B and C of the same bird (Fig. 3h). 
However, syllables B and C differentiated from parent syllable a early in 
development (~60 dph, Fig. 3b), while D and B differentiated from 6 at a 
much later stage (~80 dph, panel c). One might speculate that the splitting 
of D and B may have failed to reach completion before the bird reached 
adulthood, possibly preventing further splitting. Neural evidence (shared 
burst sequence) for hierarchical differentiation was also observed in bird 6 
(data not shown). Neural evidence (shared burst sequence) for bout-onset 
differentiation was also observed in bird 5 (data not shown). 
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Extended Data Figure 8 | Simultaneous formation of multiple syllable 
types into an entire motif. All data are from bird 3. Neural recordings from 
this bird support the view that, in the ‘motif strategy, new syllables emerge 
from a common rhythmic protosequence. a, Song examples during the 
emergence of a motif. Panels from top to bottom: first, subsong (37 dph); 
second, the song began to acquire rhythmic ‘protosyllable’ modulation 

in song amplitude around 9 Hz (45 dph); third, over the next five days 
(47-51 dph), this bird acquired a reliable pattern of 4-5 acoustically distinct 
elements (‘syllables’), each generated in a different cycle of the 9 Hz rhythm 
(48 dph); fourth, the acoustic structure in each syllable was gradually 
refined, resulting in an excellent match to the tutor song even at this early 
age (51 dph); fifth, tutor song. b, Scatter plot of syllable duration and pitch 
goodness (n = 300 syllables per day; colour coded according to syllable 
identity in panel a). c, Development of song rhythmicity quantified as the 
spectrum of the sound amplitude**. Gray shade indicates the pass band 

for the filter used in phase segmentation. d, Phase segmentation based on 
the rhythmicity in the song. Top, song spectrogram with phase segments 
(grey boxes). Middle, sound amplitude (blue) and band-pass filtered sound 


amplitude (magenta). Syllable segmentation based on the sound amplitude 
is shown as white boxes. Bottom, instantaneous phase (green) of the band- 
pass filtered sound amplitude. Phase segments (grey boxes) are obtained by 
detecting threshold crossing (black dotted line) of the instantaneous phase. 
e, Rhythmic neuron (protosyllable stage; HVC,; 45 dph). f, Neuron shared 
between syllables A and B (HVCaa; 48 dph). g, Neuron shared between 
Band E (HVC; 49 dph). h, Population raster aligned to the five-syllable 
motif for neurons that were significantly locked to any syllable (n= 10 
neurons). Each motif and associated spike times were time-warped using a 
piecewise linear method” based on syllable onsets and offsets. i, Histogram 
of the absolute phase difference between the two syllables for all shared 
neurons (= 8 neurons; mean phase difference: 41 + 33.9 deg, mean +s.d.). 
j, Cumulative distribution of the mean absolute phase difference after 
randomizing burst identity (red dotted line indicates P= 0.05 threshold for 
significance; red triangle indicates observed mean absolute phase difference, 
P=0.013). Statistical details in Methods. Scale bars for panels e-g, 30 dB, 
0.3 mV, 200 ms. 
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Extended Data Figure 9 | Another example of shared burst sequences 
during the emergence of new syllable types. All data are from bird 4. 

a, Song examples during the emergence of a motif ABCDF. Note the nearly 
simultaneous emergence of multiple syllable types in nearly fixed order 

(52 dph). Tutor song shown at the bottom. Phase segments are shown 
above the spectrogram for song at 43 dph. b, Top, song rhythm spectrum 
calculated in the protosyllable stage (43 dph) and after motif formation 

(59 dph). Note the pronounced peaks at 5 Hz and 10 Hz in both stages. 
Bottom, syllable duration distribution in the protosyllable stage (43 dph) 
and after motif formation (59 dph) showing two peaks. At 43 dph, the 

peak at 70 ms indicates short protosyllables corresponding to one cycle 

of the 10 Hz rhythm, and the peak at 140 ms indicates longer syllables 
formed by two protosyllables fused across two cycles of the 10 Hz rhythm 
(doubled protosyllables). Example doubled protosyllables are seen in the 
first and third syllables of panel a, 43 dph. (Note that boxes at the top of this 
panel indicate phase segments, not syllable boundaries). c, Hypothesized 
mechanism of motif construction, based on the examination of acoustic 
structure and analysis of neural burst sequences (see below). Notably, in 
this bird, the majority of syllables emerged nearly simultaneously in a 
relatively fixed order, consistent with a ‘motif strategy: d, Scatter plots of 
syllable duration versus mean spectral centre of gravity at four stages of 
vocal development (each dot represents a single syllable; n = 500 syllables 
per day; colour coded according to syllable identity in panel a). e, Neuron 
bursting at the 10 Hz protosyllable rhythm (HVC; 48 dph). Phase segments 
shown above spectrogram. f, Top, neuron bursting at the 10 Hz rhythm 
(HVCx; 49 dph). Bottom, simultaneous recording of a neuron bursting on 
alternate cycles of the 10 Hz rhythm (HVCaa). g, Shared neuron bursting on 
second half of syllable ‘b’ (labelled bz) and first half of syllable ‘d’ (labelled 
d;) (HVCpa; 51 dph). h, Shared neuron bursting rhythmically on ‘by; ‘°c 
and second half of ‘d’ (d) (HVCga; 51 dph). i, Shared neuron bursting on 
‘a and ‘d;’ (HVCaa; 58 dph). j, Shared neuron bursting on ‘dy; ‘e) and last 
part of ‘f’ (HVCga; 57 dph). k, Population raster of 12 neurons that were 
significantly locked to protosyllable onsets (48-49 dph). Protosyllables were 
identified using phase segmentation (see Methods). 1, Population raster 
showing neurons active during syllables ‘b’ and/or ‘d, recorded early in 
syllable differentiation. Neurons shared between ‘b and ‘d;’ are grouped at 
top. Neurons specific for ‘b’ are grouped next, and neurons specific for ‘? 
are grouped at bottom. m, Same as panel 1, but for neurons recorded later in 
development. n, Population rasters showing neurons active during syllables 
‘¢ and/or ‘d; recorded early in development. 0, Same as m, but for neurons 
recorded later in development. Scale bars for panels e-j, 0.5 mV, 200 ms. 
Neural evidence for hypothesized mechanism of motif construction: based 


on an analysis of acoustic signals and neural recordings, we have formulated 
a hypothesis for how the song of this bird developed, from the formation of 
the protosyllable to the emergence of the complete motif. We hypothesize 
that the fundamental protosyllable element corresponds to the prominent 
10 Hz peak in the rhythm spectrum and the 70 ms peak in the duration 
distribution (panel b). This view is further supported by the presence of 
neurons in the protosyllable stage that generate rhythmic bursts at 10 Hz 
(panels e and f; 11/18 neurons were rhythmic, 5/11 rhythmic neurons 
exhibited periodicity at 10 Hz), and the existence of a burst sequence 
during the protosyllable (panel k). In this bird, the rhythmic protosyllables 
differentiated nearly simultaneously, at an early age (52 dph, panel a), into a 
complete sequence of distinct syllables that subsequently formed the adult 
song, suggesting this bird employed a ‘motif strategy? One complication of 
this simple view is that there may have been an early partial splitting of the 
short protosyllable « into two ‘daughter’ protosyllables a, and 02, which 
alternated to produce the elements of the final motif (panel c). Two lines of 
evidence based on neural activity support this view: First, many neurons 
recorded at an early stage (<50dph) exhibited a prominent 5 Hz periodicity 
in their rhythmic bursting, (panels f and h; 6/11 rhythmic neurons), 

rather than the expected 10 Hz period (panels e and f, top trace). This 
observation led us to consider the possibility that the 100 ms neural 
sequence, corresponding to the dominant 10 Hz protosyllable rhythm, 
underwent a partial splitting during the protosyllable stage—similar to the 
alternating differentiation described for bird 1 (Fig. 3; Extended Data 

Fig. 4). This would result in two distinct alternating protosyllable sequences 
a) and a» (panel c). Such splitting would effectively double the period of 
the protosyllable rhythm, and would account for the ‘doubled’ protosyllables 
and the 5 Hz peak in the rhythm spectrum (panel b). The existence of short 
and doubled protosyllables led us to hypothesize that the short syllables 

of the adult motif (‘a, ‘c, and ‘e’) arose from the short protosyllables, while 
long adult syllables (“b’ and ‘d; and possibly ‘f’) arose from the doubled 
protosyllables (panel c). Early syllable ‘e’ is later dropped by the juvenile, 
although it appears in the tutor song. Furthermore, the analysis of shared 
sequences (panels l-o) revealed a predominance of shared neurons between 
syllable elements in alternating cycles of the underlying 10 Hz rhythm. For 
example, shared neurons were observed between syllables ‘a, ‘by’ and ‘d,’ 
(panel i for neuron shared between ‘a and ‘d)’; panels g and | for neurons 
shared between ‘b,’ and ‘d,’). Shared neurons were also observed between 
syllables “b,} ‘c, and ‘d,’ (panel h for neuron shared between ‘bj; ‘c, and 

‘dy’; panel n for neurons shared between ‘°c and ‘d,’). In contrast, many 
fewer shared neurons were observed between neighbouring cycles of the 
underlying rhythm, although examples of this can be found (panel j). 
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Extended Data Figure 10 | Model of other strategies for syllable 
formation. a—d, Bout-onset differentiation results from activation of 
bout-onset seed neurons (blue arrow) followed by rhythmic activation 

of protosyllable seed neurons (red arrow). Network diagrams show (a, b) 
protosyllable formation and (c, d) splitting of chains specific for bout-onset 
syllable 8 and specific for later repetitions of the protosyllable a (blue 

and red, respectively; shared neurons: black). e-h, Model of simultaneous 
formation of multiple syllable types into an entire motif (‘motif strategy’). 

e, f, Protosyllable seed neurons (magenta lines) were activated rhythmically 
to form a protosequence. g, Seed neurons were then divided into three 
sequentially activated subgroups, resulting in the rapid splitting of the 
protosequence into three daughter sequences. In intermediate stages 

(panel g), individual neurons exhibited varying degrees of specificity and 
sharedness for the emerging syllable types. h, After learning, the population 
of neurons was active sequentially throughout the entire ‘motif? but 
individual neurons were active during only one of the resulting syllables, 
forming three distinct non-overlapping sequences. i-k, Network diagrams 
and raster plots showing an example of the formation of a new syllable chain 
at bout onset. In the network diagrams, seed neurons are indicated within 
magenta boxes, and bout-onset seed neurons and protosyllable seed neurons 
are indicated by blue and red arrows, respectively. Neurons specific for each 
emerging syllable type (e and «) are coloured blue and red, respectively. The 
three panels represent the early protosyllable stage, the late protosyllable 
stage, and the final stage. The training protocol is similar to that for bout- 
onset differentiation (panels a—d), except that protosyllable seed neurons 


are driven more strongly throughout the learning process. As a result, 
protosyllable seed neurons did not become outcompeted by the growing 
bout-onset chain. Strong activation of the protosyllable seed neurons 

also terminated activity in the bout-onset chain through fast recurrent 
inhibition, thus preventing further growth of the bout-onset chain, as 
occurs in bout-onset differentiation. Regarding the role of chain splitting 
in the formation of new syllable types: in our model, we envision that the 
formation of daughter chains in HVC is translated into the emergence of 
new syllable types is as follows. During the splitting process, as two distinct 
sequences of specific neurons develop, their downstream projections can 
be independently modified®””’ such that each of the emerging chains 

of specific neurons can drive a distinct pattern of downstream motor 
commands, allowing distinct acoustic structure in the emerging syllable 
types. Such differential acoustic refinement is consistent with the previous 
behavioural observation that the altered acoustic structure of new syllables 
emerges in place, without moving or reordering sound components (‘sound 
differentiation in situ’)**. This model naturally explains the apparent 
‘decoupling’ of shared projection neuron bursts from acoustic structure in 
the vocal output—that is, the fact that the bursts of shared neurons become 
associated with two distinct acoustic outputs during the differentiation 

of two syllable types (Extended Data Fig. 5). Specifically, during syllable 
differentiation, a shared neuron participates with different ensembles 

of neurons during each of the emerging sequences, and these different 
ensembles can drive different vocal outputs. 
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Acute off-target effects of neural circuit 


manipulations 


Timothy M. Otchy!?, Steffen B. E. Wolff!*, Juliana Y. Rhee!*, Cengiz Pehlevan**, Risa Kawai!, Alexandre Kempf't, 


Sharon M. H. Gobes!} & Bence P. Olveczky!* 


Rapid and reversible manipulations of neural activity in behaving animals are transforming our understanding of brain 
function. An important assumption underlying much of this work is that evoked behavioural changes reflect the function 
of the manipulated circuits. We show that this assumption is problematic because it disregards indirect effects on the 
independent functions of downstream circuits. Transient inactivations of motor cortex in rats and nucleus interface (Nif) in 
songbirds severely degraded task-specific movement patterns and courtship songs, respectively, which are learned skills 
that recover spontaneously after permanent lesions of the same areas. We resolve this discrepancy in songbirds, showing 
that Nif silencing acutely affects the function of HVC, a downstream song control nucleus. Paralleling song recovery, 
the off-target effects resolved within days of Nif lesions, a recovery consistent with homeostatic regulation of neural 
activity in HVC. These results have implications for interpreting transient circuit manipulations and for understanding 


recovery after brain lesions. 


Understanding how the brain generates behaviour is a daunting task 
often simplified by studying anatomically distinct brain regions in iso- 
lation. The underlying assumption is that different parts of the brain are 
specialized for different functions that can be understood by monitor- 
ing and altering activity in local circuits. An increasingly powerful and 
widely used approach is to transiently silence or otherwise perturb—by 
optogenetic, pharmacological or other means—neural activity in 
specific circuits and observe the consequences on behaviour’. If there 
is an effect, the conclusion is that the circuit under investigation is 
causally ‘involved’ in the behaviour. But what does such a causal link 
actually tell us? 

Ina densely interconnected dynamical system like the brain, sudden 
perturbations to one node (for example, a brain area) could send ripples 
through the system, compromising the capacity of downstream circuits 
to perform computations on other inputs or generate patterned activ- 
ity from internal dynamics*“. Given the reliance on transient circuit 
manipulations for localizing computations and memory functions to 
specific neural circuits or brain areas)”, the caveats and limitations of 
these methods should be scrutinized. 

If inactivating a brain area interferes with the independent functions 
of downstream circuits, that is, functions not contingent on information 
provided by the targeted area, an important next question is whether 
those functions remain compromised after the silencing is made per- 
manent through lesions. For example, deficits caused by changes in the 
excitability of downstream neurons could plausibly resolve through 
homeostatic regulation of neural activity’ °. Spontaneous recalibration 
of neural dynamics, more generally, could help explain why chronic 
effects of permanent lesions are often far less severe than those induced 
by transient inactivations!”"’, and why patients with strokes and 
other brain injuries can overcome some of their initial deficits without 
rehabilitation”. 

Functional recovery after brain lesions, however, is thought to be 
driven predominantly by the adoption of new behavioural strategies 
and the adaptive repurposing of non-lesioned circuits!*", processes 


contingent on renewed experience with affected tasks!>. Therefore, 
demonstrating acute off-target effects of inactivations and sponta- 
neous recovery after permanent lesions requires showing that task- 
specific behaviours sensitive to transient inactivations can recover after 
lesions without additional task experience. As experience-dependent 
recovery is difficult to rule out for basic sensory or motor functions 
that are central to many behaviours and hence naturally ‘practiced’ 
after lesions!°!4, our study used behaviours for which such incidental 
practice can be withheld. We chose learned movement sequences of 
rats'© and courtship songs of zebra finches’” because they are task- 
specific skills associated with complex, stereotyped and idiosyncratic 
motor patterns that can be precisely quantified and compared across 
various manipulations. 

To probe the effects of transient manipulations on distinct and inde- 
pendent functions of downstream circuits, we targeted brain areas— 
motor cortex in rats and the sensorimotor area Nif in songbirds—that 
are known, based on lesion studies, to be dispensable for storing and 
executing the skills we study!®!®. Despite this, transient manipula- 
tions severely degraded the learned behaviours in both systems. These 
discrepancies were consistent with acute disruptions of downstream 
circuit function. Though we saw similar behavioural effects immedi- 
ately after permanent lesions, they resolved spontaneously, leading to 
full recovery of the initially affected behaviours. 


Motor cortex inactivation disrupts skill execution 

In our motor learning task, rats are rewarded for pressing a lever ina 
precise temporal sequence (two presses 700 ms apart, Fig. 1a). Animals 
solve this task by acquiring spatiotemporally precise movement pat- 
terns that produce the prescribed lever-press sequence’®. Though the 
learned skills are robust to motor cortical lesions!®, motor cortex pro- 
jects to sub-cortical motor structures whose distinct functions could be 
sensitive to sudden changes in motor cortical input (Fig. 1b). To probe 
this, we inactivated primary forelimb motor cortex of rats that had 
learned the task (n =5 rats) by injecting 100 nl of the GABA, agonist 
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Figure 1 | Motor skills that survive motor cortical lesions are acutely 
affected by transient manipulations of motor cortex. a, We used a motor 
skill learning paradigm that trains rats to press a lever twice with a specified 
inter-press interval (IPI), typically 700 ms. b, Schematic of the mammalian 
motor system. Motor cortex (MC, black) provides input to subcortical 
circuits (red shaded regions, BG, basal ganglia; CB, cerebellum; BS, 
brainstem; Th, thalamus; SC, spinal cord). ¢, Coronal sections comparing 
the spread of a fluorescent marker (fluorescein) matched in concentration 
and volume to our muscimol injections (left) with motor cortex lesions that 
leave the learned skills intact’® (right, dashed lines denote the lesioned area) 
(Methods). d, Left, forepaw trajectories associated with five consecutive 
trials after PBS and muscimol injections for the rat that received the lowest 
dose of muscimol (100 nl, 1 mM) (Supplementary Video 1). Right, motor 
cortex was subsequently lesioned in this rat. Paw trajectories associated 
with five consecutive trials from the last training session before and the 
first training session after the lesion. e, Fraction of trials with IPIs within 
20% of the target (‘successful trials) for different experimental conditions 
(n=5 rats). Lesion data from ref. 16 shown for comparison (light-grey 
bar). f, Wireless optogenetic stimulation (Methods). g, Paw trajectories 
for five consecutive trials for an example rat with and without optogenetic 
stimulation of motor cortex. h, Same as e, but with and without optogenetic 
stimulation of motor cortex (1 =5 rats). Error bars represent standard error 
of the mean (s.e.m.). ***P < 0.001. For tests of significance for this and all 
other figures see Methods. 


muscimol (1-25 mM) into the hemisphere contralateral to the domi- 
nant paw!”° (Methods). Based on previous studies”! and injections of 
100 nl of fluorescein into motor cortex (Fig. 1c), we estimate that the 
direct effects of our injections were confined to a volume far smaller 
than our previous lesions!° (Fig. 1c; Methods). 

In contrast to animals tested for the first time 5-10 days after 
lesions!®, muscimol-injected rats had severe deficits in skill execution 
with marked drops in performance and disrupted paw kinematics 
(Fig. 1d, e). These effects were evident even in the rat receiving the lowest 
concentration of muscimol (1 mM) (rat in Fig. 1d, Supplementary 
Video 1). We later lesioned motor cortex in this rat, and as previously 
reported (rat ‘Kansas’ in ref. 16), saw no effect on skill execution when 
the rat was tested again 10 days post-lesion (Fig. 1d). 

To explore dose-dependence, we injected larger volumes of 
muscimol in two of the rats (200 nl and 400 nl respectively). We 
found task performance to be even more affected with lever- 
interactions restricted to a few single presses with no rewarded trials 
(Supplementary Video 1). 


Optogenetic stimulation of motor cortex 

Transient stimulation of neural activity is an alternative method for 
disrupting ongoing circuit dynamics that is well-suited for interrogating 
processes associated with precise and reproducible neural dynamics”. 
As with transient inactivations, sudden activation can also plausibly 
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affect the dynamics and function of downstream circuits. To probe the 
effect of transient motor cortex stimulation on skill execution, we used 
optogenetics, a widely adopted method for manipulating neurons in 
temporally specific ways”. 

We expressed the optogenetic activator Chrimson”* in motor cor- 
tex (n=5 rats; Extended Data Fig. 1a, b; Methods), and stimulated 
the hemisphere contralateral to the dominant paw after animals had 
reached asymptotic performance on the task (Fig. 1f and Extended 
Data Fig. 1c). Neither brief (50 ms) nor sustained (1s) optogenetic 
stimulation evoked visible motor responses during rest, suggesting that 
they were sub-threshold for movement initiation. However, both brief 
and sustained stimulation, triggered on the first lever-press in a trial, 
interfered with task performance and associated kinematics (Fig. 1g, h, 
Extended Data Fig. 2 and Supplementary Video 2). Thus, similar to 
transient inactivations, disrupting normal activity patterns in motor 
cortex by optogenetic stimulation compromises the animals’ capacity 
to execute skills robust to permanent lesions. 


Effects of Nif lesions and inactivations differ 

Our results suggested that behavioural effects of transient perturba- 
tions may overestimate the steady-state functions of targeted circuits. 
To examine whether this caveat should be considered more broadly, 
we similarly probed whether transient inactivations of sensorimotor 
nucleus Nif in zebra finches affect their courtship songs. Although the 
song survives Nif lesions'®, Nif sends excitatory projections to HVC, 
an essential part of the song control circuit believed to generate the 
temporal pattern for learned vocalizations through intrinsic network 
dynamics” (Fig. 2a and Extended Data Fig. 3a). 

We first confirmed the findings of previous lesion studies'® by inject- 
ing 27-36 nl of N-methyl-p1-aspartic acid (NMA), an excitotoxin, 
bilaterally into Nif (1 =5 birds) (Fig. 2a and Extended Data Fig. 3b). 
When Nif lesioned birds resumed singing two days after surgery, their 
songs were similar to pre-lesion (Fig. 2b, c), consistent with prior stud- 
ies. Because birds did not sing within the first day of lesions, this result 
does not preclude short-term effects of Nif silencing'®. To probe such 
acute effects, we injected 27 nl of muscimol (50mM) bilaterally into Nif 
of awake head-restrained adult birds (n =5 birds; Fig. 2d, e, Extended 
Data Fig. 4a; Methods). 

Birds typically sang within 20 min of the injections, but their songs 
were severely degraded and reminiscent of subsong (Fig. 2f, g), highly 
variable and unstructured utterances normally produced by juvenile 
birds at the start of vocal learning or by adult birds after bilateral HVC 
lesions”®. The syllable duration distributions were similar to those 
reported in HVC-lesioned birds”® (Fig. 2h), suggesting that Nif inacti- 
vation degrades song by indirectly affecting HVC dynamics. 

To exclude the possibility that the behavioural effects were caused 
by diffusion of muscimol into HVC, we injected the same dose 
300-500 xm dorsal to Nif but closer to HVC (n=5 birds), as well as 
smaller volumes (9 nl) into Nif (n =2 birds). The control injections 
above Nif did not affect song (Fig. 2f, g), whereas the smaller Nif injec- 
tions evoked effects similar to the larger dose (Extended Data Fig. 4b). 


HVC dynamics recover after Nif lesion 
Transient circuit manipulations performed in both rats and songbirds 
revealed strong effects on skill execution not seen after permanent 
lesions (Figs 1 and 2). The discrepancy could not be explained by 
experience-dependent relearning in lesioned animals because the skills 
recovered their idiosyncratic pre-lesion form without any intervening 
practice (Figs 1d, e and 2b, c)!°. However, the acute behavioural defi- 
cits were consistent with activity manipulations in motor cortex and 
Nif indirectly affecting the independent functions of downstream cir- 
cuits. These initially affected circuits, however, seemingly regained their 
capacity to execute the learned behaviours after permanent lesions!*"®. 
Unlike in rats, where the neural circuits underlying the skills we 
assay have yet to be characterized, the vocal control circuits in zebra 
finches have been well delineated’’, making it feasible to investigate 


17 DECEMBER 2015 | VOL 528 | NATURE | 359 


© 2015 Macmillan Publishers Limited. All rights reserved 


ARTICLE 


Nif lesions 


HVC 


Nif inactivation 
d_ with muscimol 


o- 


fe labelled from HVC 


Pre-lesion 


0 


Baseline 


CHEE A BC 


PBS in Nif 


{ Injection in Nif 


Syllable duration (ms) 


Figure 2 | Transient inactivations of Nif severely degrade adult zebra 
finch song, while permanent lesions have no noticeable effect when 
singing resumes two days later. a, Schematic of the song control circuit 
(red nuclei). Nif, a sensorimotor nucleus that inputs to HVC, was 
lesioned bilaterally (n= 5 birds). b, Left, spectrograms show two song 
motifs (syllables ABCD) from an example bird before (top), and after 
(bottom) bilateral Nif lesion. Right, joint entropy-duration distributions 
for song syllables uttered before and two days after the lesion. Letters 
denote syllables in the bird’s song motif. c, Summary statistics showing 
the difference (Wasserstein distance) between the joint entropy-duration 
distributions before and on different days after Nif lesions. ‘Baseline’ 
compares the distributions from two consecutive days of pre-lesion 


downstream effects of local circuit manipulations. On the basis of 
our results and known anatomy (Fig. 2), we hypothesized that Nif 
inactivations perturb vocal output by removing excitatory input 
from HVC, thus compromising the function of this song-specialized 
premotor area™*. 

Most lesion protocols require surgery, which suppresses singing for 
a day or two, exactly the timeframe during which we hypothesize that 
recovery in HVC function occurs. To monitor neural dynamics in HVC 
in the immediate aftermath of Nif lesions and to compare it to pre- 
lesion dynamics, we lesioned Nif in freely behaving birds while record- 
ing multi-unit neural activity in HVC”. Stimulation electrodes targeted 
to Nif were implanted together with recording probes in ipsilateral 
HVC (Fig. 3a; Methods). Nif was lesioned unilaterally by injecting 
501A of current for 30-40 s (n= 11 birds; Fig. 3b). Nif was successfully 
ablated (>80% lesioned; Methods) in 4 out of 11 birds, and subsequent 
analysis was done on this cohort unless otherwise noted. 

Spontaneous (that is, non-vocal) HVC activity was dramatically 
reduced immediately following the lesions, consistent with a sudden 
loss of excitatory input from Nif*®, but recovered in the ensuing hours 
(Fig. 3c). Singing, which was invariably interrupted by the electrical 
stimulation, resumed after 1.3 +0.9h (Fig. 3c-e). A fraction of the 
initial post-lesion vocalizations were severely degraded and did not 
resemble pre-lesion song'® (Fig. 3d, e and Extended Data Fig. 5). The 
effects were less severe than during bilateral Nif inactivation (Fig. 2f, g), 
probably reflecting bilateral control of zebra finch song*’. 

For vocalizations that resembled pre-lesion song, neural activity 
was aligned to a common song template (Methods). Although song- 
aligned activity patterns were similar across renditions in the hours 
following Nif lesions, they were strikingly different from pre-lesion 
dynamics (Fig. 3f-h). 

Despite the initial degradation of song and associated HVC dynamics, 
both gradually recovered (Fig. 3e-h). By the second day, the song was 
reliably back to pre-lesion form (Fig. 3e). Remarkably, the average 
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singing. d, Schematic showing bilateral Nif inactivation. e, Histological 
sections of Nif. Top, injection of cholera toxin into HVC retrogradely 
labels Nif (green). Bottom, fluorescent dextran co-injected with muscimol 
(violet). Red arrows denote estimated boundaries of Nif. f, Spectrograms 
(left) and syllable entropy-duration distributions (right) as in b for an 
example bird subjected to various injection protocols. g, Same as c, but 
comparing pre-injection songs to songs after muscimol/PBS injections 
(n=5 birds). h, Syllable-duration distributions in Nif-inactivated birds 
compared to HVC-lesioned birds. Data for HVC-lesioned birds from 
ref. 26. Shown for comparison are the distributions for baseline and Nif 
inactivation for the bird in f. Error bars represent s.e.m. ***P < 0.001. 


song-aligned activity patterns in HVC also recovered their pre-lesion 
structure (Fig. 3f-h). By the third day, the residual difference was con- 
sistent with normal drift in the recordings. Interestingly, the temporal 
structure of song-aligned HVC activity recovered predominantly dur- 
ing the night, while song-related HVC power recovered during the day 
(Fig. 3i; Methods). 

To assess the extent to which acute post-lesion changes in HVC 
dynamics were caused by removal of Nif input versus non-specific 
effects of the current injections, we quantified changes in song-related 
HVC dynamics as a function of Nif lesion size. We found the extent 
of Nif damage to be strongly correlated with changes in song-related 
HVC activity following lesions (Fig. 3j), consistent with the acute 
degradation of song and associated HVC activity being due to removal 
of Nif input to HVC. 


Activity homeostasis explains functional recovery 

The spontaneous and gradual recovery of HVC activity after Nif 
lesions (Fig. 3c, h) was suggestive of homeostatic regulation of neural 
activity’-°. To probe whether this could explain the observed song 
recovery, we modelled HVC as a synaptically connected chain of 
neurons (a ‘synfire chair) that receives time-varying excitatory input 
from Nif** (Fig. 4a; Methods). The network generated stable prop- 
agation of synchronous spiking activity, much like what is assumed 
for HVC during singing**. Acute removal of Nif input prevented 
many neurons in the chain from reaching spiking threshold, causing 
activity propagation to slow and often stop prematurely (Fig. 4c, d). 
Homeostatic regulation of neural activity in the HVC network was 
implemented by adaptively adjusting either spiking threshold** 
(Fig. 4b), input resistance*® or strength of synaptic inputs*” (Extended 
Data Fig. 6) of individual HVC neurons (Methods). These mechanisms 
all had similar effects: increasing the probability of HVC spiking while 
speeding up chain propagation and decreasing the likelihood of early 
‘song’ terminations (Figs 4b-d and Extended Data Fig. 6). 
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Figure 3 | Initial disruption and subsequent recovery in vocal 
performance and HVC dynamics following Nif lesions. a, Schematic of 
the experiment: lesioning Nif unilaterally while continuously recording 
neural activity from ipsilateral HVC. b, Histology of an electrolytic Nif 
lesion. ¢, Left, spontaneous activity in HVC before and at different time 
points after unilateral Nif lesion in an example bird. Right, recovery of 
spontaneous HVC activity normalized to pre-lesion rates averaged across 
Nif lesioned birds (n = 4) (lesion indicated by red arrow). Spontaneous 
activity recovered with a time constant of 3.4 + 1.5h. Shaded region 
denotes s.e.m. d, Representative spectrograms (left) and joint entropy- 
duration distributions (right) for songs of an example bird before and at 
different times after unilateral Nif lesion. e, Summary statistics showing 
the difference (Wasserstein distance) between the joint entropy-duration 
probability distributions before, and at different times after, unilateral Nif 
lesions. Baseline (‘Bsl’) compares the distributions from two consecutive 
days of pre-lesion singing. f, Recovery in song-related HVC dynamics 
following Nif lesion for the bird in d. Left, song-aligned neural power in 
HVC for song motifs uttered on the day of Nif lesion (red arrow) and until 
3 days after. Middle, correlation between the song-aligned HVC activity 
pattern and the average pre-lesion activity pattern. Right, average neural 
power in HVC during singing. Bottom-left panel, average song-aligned 


Given that HVC is known to control song timing”, our simulations 
generated predictions for the temporal structure of song following Nif 
lesion (Fig. 4d). In agreement with these predictions, we found a tran- 
sient increase in premature song terminations (Fig. 4e, f and Extended 
Data Fig. 5). Upon closer inspection, we also found that the song slowed 
down after lesion'’, only to recover in the ensuing days (Fig. 4f), again 
consistent with the qualitative predictions of our model. Though 
we cannot exclude other mechanisms, our network simulations are 
consistent with functional recovery after Niflesions being due to home- 
ostatic regulation of neural activity in HVC. 


Discussion 
Although efforts to understand the brain must necessarily rely on 
reductionist approaches, the simplifications and assumptions made 


neural activity right before and after the lesion, and 3 days later. Bottom- 
right panel, same as on the left, but showing normal drift in the neural 
recording over the 3 days preceding the lesion. g, Recovery in song-aligned 
HVC activity following unilateral Nif lesions measured as the Pearson's 
correlation to the average pre-lesion activity pattern and averaged across 

4 birds (red trace, Methods). Data points correspond to the first and last 
batch of 25 song motifs on each day. The control trace (black) comes 

from recordings in 3 of the 4 lesioned birds, but before the Nif lesions, 

and represents the expected drift in HVC recordings. h, Similar to 

g, but showing recovery of the mean neural power in HVC during song, 
normalized to pre-lesion power. i, Recovery in HVC dynamics parsed by 
day (start to end of day-time singing) and night (end of singing on one day 
to start of singing the next), over the first 60 h post-lesion. Top, recovery 
in the correlation to pre-lesion HVC dynamics. Bottom, recovery of mean 
neural power. j, Correlation between song-aligned HVC activity before 
and immediately after Nif lesions (top), and the mean song-related neural 
power immediately after lesions normalized to pre-lesion values (bottom), 
as a function of the fraction of Nif lesioned. Grey-filled circles identify 
birds with >80% Nif lesions that were included in the summary analyses 
(c, e-i). Error bars represent s.e.m. ***P < 0.001, **P< 0.01. 


in this pursuit must be scrutinized to prevent misleading conclusions. 
Given the increased reliance on transient circuit manipulations (for 
example, optogenetics, pharmacology, pharmacogenetics, cooling and 
transcranial magnetic stimulation) for localizing brain function, we 
tested whether behavioural effects induced by sudden activity pertur- 
bations reliably reflect the computations carried out in targeted areas. In 
two different systems, the deficits induced by transient manipulations 
seemingly overestimated the steady-state function of the examined 
circuits (Figs 1 and 2). This could be explained by the manipulations 
acutely affecting the independent functions of downstream circuits. 
We found that such off-target effects can resolve after the targeted area 
is permanently lesioned (Fig. 3). Importantly, the post-lesion recovery 
process did not require any renewed experience with the task, and was 
consistent with homeostatic regulation of neural activity (Fig. 4). 
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Figure 4 | Homeostatic regulation of spiking activity in HVC neurons 
can account for the functional recovery after Nif lesions. a, HVC is 
modelled as a chain of synchronously firing neurons, with each neuron 
receiving a different time-varying input from Nif (Methods). b, Top, Nif 
removal reduces the probability of spiking during simulated ‘song’ in a 
model neuron (from the 40th node). Bottom, homeostatic regulation 

of neural activity is implemented by adaptively adjusting the spiking 
threshold (Methods). c, d, Behaviour of the HVC model network after 
removal of Nif input and subsequent homeostatic regulation of single 
neuron firing rates. c, Fifty simulated ‘songs’ before and at two different 
times (1,500 and 4,000 simulations respectively) after Nif removal. 
Completed ‘songs’ in grey; truncated ones in red. d, Top, fraction of 
simulations for which activity in the model HVC network propagated to 
the end. Bottom, average duration of a full chain-propagation as a function 
of homeostatic recovery. e, f, Similar to c and d, respectively, but for birds 
with unilateral (n = 4) and bilateral (n =5) Nif lesions. f, Motif completion 
rates (top) and tempo (bottom) relative to pre-lesion baseline. Data for 
intact birds come from a subset of the birds that were later lesioned. Error 
bars represent s.e.m. 


Discrepancies between acute and chronic behavioural effects of 
targeted inactivations/lesions have been recognized in other con- 
texts!°-?4 Acute effects are almost invariably more severe, a discrep- 
ancy typically explained by the brain adaptively compensating for lost 
function after lesions’>. By not allowing time for experience-dependent 
compensation, transient circuit manipulations are seen as overcoming 
this ‘caveat’ of lesions. However, if the goal is to assign computations 
and memory functions to specific brain areas, our results suggest that 
transient circuit manipulations may have their own interpretive diffi- 
culties that stem from acute effects on the independent functions of 
non-targeted circuits. 

That the function of a circuit can be sensitive to sudden perturba- 
tions in chronically non-essential inputs is not surprising. The brain—a 
finely tuned, complex, and heavily interconnected dynamical system— 
operates in a fairly limited dynamic regime’, making it plausible that 
local circuit perturbations could interfere with the dynamics and inde- 
pendent functions of remote circuits*’. For example, sudden removal of 
permissive inputs could tilt a network’s excitatory-inhibitory balance*®, 
thus compromising its function*!. This is seemingly what happens to 
HVC after Nif is silenced. Loss of excitatory input from Nif causes an 
acute decrease in the activity of HVC neurons*, rendering the network 
incapable of producing its normal output (Figs 3f, g and 4). 

The intricacies of dissecting interconnected biological networks and 
assigning functions to discrete nodes in those networks have been rec- 
ognized in other contexts, including genetic and molecular networks”. 
In such studies, the distinction between permissive and instructive 
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functions is routinely made**“*. Our results suggest that a similar dis- 
tinction should be considered when interrogating the role of neural 
circuits in behaviour*’, with a circuit being classified as ‘permissive if 
its activity is acutely required for the expression of a behaviour without 
providing essential information for any of the underlying computations 
or memories. In contrast, a brain area should be considered ‘instructive 
if it contributes essential information or computation not otherwise 
available to the system implementing the behaviour. 

Although the behavioural effects of sudden activity perturbations 
may not reliably reflect the steady-state function(s) of a circuit, lesions 
can, in certain cases at least, contribute additional insight. Permanent 
silencing of Nif and motor cortex suggested that the capacity of these 
brain areas to influence the respective skills we study—evident from 
transient manipulations—is not exercised under normal conditions, 
consistent with permissive roles. That Nif and motor cortex have access 
to the essential control circuits likely reflects instructive roles for these 
brain areas in behavioural processes that we did not test. Nif, for exam- 
ple, provides early auditory priming of HVC essential for imitative song 
learning”’, while motor cortical input to subcortical motor circuits 
is required for the initial acquisition of the skills we train!® and for 
modulating other low-level motor behaviours*®. 

Importantly, neural circuit function acutely compromised by sudden 
changes in permissive input can recover after those inputs are perma- 
nently silenced. Both skills we studied recovered after lesions without 
any task-specific practice, suggesting largely spontaneous recovery 
processes. Although the mechanisms that underlie such recovery will 
need to be further examined, our results are consistent with a role for 
homeostatic regulation of neural activity’? (Fig. 4 and Extended Data 
Fig. 6). A similar recovery to the one we observed in HVC of songbirds 
(Fig. 3) has been described for the network underlying the pyloric 
rhythm in crustaceans®, where homeostatic regulation of neuronal 
dynamics is thought to underlie the recovery of circuit function after 
removal of permissive or modulatory input?. 

Interestingly, we found that the structure of song-aligned HVC 
activity recovered predominantly overnight, while overall HVC power 
recovered during the day (Fig. 3i). This dissociation is consistent with 
the synaptic homeostasis hypothesis of sleep*” that posits synaptic 
potentiation during wakefulness and synaptic rescaling and memory 
consolidation during sleep. Our results suggest that sleep not only con- 
solidates activity patterns associated with recent experiences“’, but may 
help restore previously established circuit dynamics, and could hence 
promote functional recovery after brain lesions”. 

As in our experimental animals, patients with lesions to motor- 
related brain areas have motor deficits that resolve in the days and 
weeks following the injury°’. Aspects of this recovery are thought to 
be independent of rehabilitation'’, suggesting spontaneous processes 
at work. Diaschisis is a broad clinical term referring to the temporary 
effects of focal brain lesions on remote brain areas’, yet the underlying 
mechanisms remain poorly understood“! Our results suggest that focal 
brain lesions can affect neural dynamics and function in remote brain 
areas, and that homeostatic regulation of neuronal dynamics may help 
resolve such acute effects, thus contributing to functional recovery after 
brain injury. 

Online Content Methods, along with any additional Extended Data display items and 


Source Data, are available in the online version of the paper; references unique to 
these sections appear only in the online paper. 
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METHODS 


Animals. The care and experimental manipulation of all animals were reviewed 
and approved by the Harvard Institutional Animal Care and Use Committee. 
Experimental subjects were female Long Evans rats 3-8 months old at start of 
training (n= 10, Charles River) and adult male zebra finches between 92-205 days 
post-hatch (n = 27). Because the behavioural effects of our circuit manipulations 
could not be pre-specified before the experiments, we chose sample sizes that 
would allow for identification of outliers and for validation of experimental repro- 
ducibility. No animals were excluded from experiments post-hoc. The investigators 
were not blinded to allocation during experiments and outcome assessment, unless 
otherwise stated. 

Behavioural training in rats. Ten rats were trained in the lever-pressing task as 
previously described'®. Water-restricted animals were rewarded with water for 
pressing a lever twice with a prescribed interval between the presses (700 ms for 9 of 
the rats, 600 ms for one). All animals were trained using our fully automated home- 
cage training system®’. Kinematic tracking of forepaw movements (Fig. 1d, g) 
was done as in ref. 16. 

Motor cortex inactivations in rats. In rats (n = 5) that had reached asymptotic 
performance in our task"®, a craniotomy was made to access the caudal forelimb 
area of primary motor cortex (CFA) in the hemisphere contralateral to the paw 
most involved in the lever-press sequence. The centre of the CFA was estimated 
from stereotactic coordinates (+1.0 mm anterior, +3.00 mm lateral, with respect 
to bregma). Kwik-Kast sealant (WPI) was applied to cover the exposed dura. In 
addition, a protective acrylic cap covering the craniotomy was attached with screws 
to three nuts secured to the skull with Metabond (Parkell). After recovering from 
surgery (10 days), animals were trained for at least one additional week to ensure 
that they were at their asymptotic performance levels. 

On injection days, rats were lightly anaesthetized with 0.5-1.5% isoflurane and 
placed in a stereotax. Motor cortex was accessed by removing the custom-made 
protective cap and the Kwik-Kast plug covering the craniotomy. Muscimol (or PBS 
for control) was injected at the estimated centre of the CFA®, 1.5mm deep, in 9.2nl 
increments every 10s using a Nanoject (WPI). The craniotomy was resealed with 
Kwik-Cast and the protective cap reattached. The whole procedure took 15-30 min, 
and rats resumed normal behaviour a few minutes later. Training sessions started 
1.5h after the injections. ‘Baseline’ performance included sessions after the 
craniotomies but before injection. Experimental days alternated between saline 
and muscimol injections. To prevent any behavioural compensation in response 
to muscimol-induced performance deficits, injected animals were tested for 
only 10 min. 

The dosing of muscimol was based on two criteria. (1) To allow compari- 
sons with our lesion study’®, the direct effect of muscimol injections should be 
restricted to a volume of motor cortex equal to or smaller than what we lesioned in 
ref. 16 (Fig. 1c). (2) To quantify effects on kinematics and performance, drug dos- 
ing should not abolish task engagement. We injected increasing concentrations and 
volumes of muscimol in two rats, and found that relatively larger doses (200-400 nl 
of 25mM muscimol) degraded performance to the point where animals quit the 
task (Supplementary Video 1). 

We converged on a dose of 100 nl of 25 mM muscimol because it generally did 
not prevent engagement with the task and because we estimated that it affects 
a volume significantly smaller than what we lesioned in ref. 16. Our estimate 
of muscimol spread is bounded by previous studies that injected larger doses 
of muscimol”? (1,11 of 2 and 9mM, respectively) and showed affected volumes 
of ~4-14mm*. In comparison, our motor cortex lesions were larger than 23 mm? 
(ref. 16). We also injected 100 nl of 25 mM fluorescein in one animal using the same 
protocol as for the experimental animals, euthanizing it 1.5h after the injection. Its 
brain was later sectioned, and the approximate spread of the dye visualized using 
fluorescence microscopy (Fig. 1c). One of the experimental animals had very severe 
performance deficits at our chosen dosing, preventing us from characterizing the 
behavioural effect (that is, very few successful trials). In this animal we reduced 
the concentration to 1 mM, at which the task engagement was robust but the 
performance still affected (Fig. 1d). 

Optogenetic stimulation. Viral injections. Adeno-associated virus (AAV2/8-hSyn- 
FLEX-ChrimsonR-tdTomato, UNC vector core”; titre: 5 x 10!? vector genomes 
(vg) per ml) was injected into the forelimb motor cortex of isoflurane anaesthe- 
tized rats (n= 5) through multiple small craniotomies (A/P, M/L: +1, +2; +1,+4; 
+1.5,+ 2.75; +2.25, + 2.5; +3, +2, coordinates relative to bregma). Injections were 
done in 9.2 nl increments while slowly moving the injection-pipette (Nanoject) 
from a depth of 1.5mm to 0.7 mm for a total volume of 0.41 per site and 2 tl per 
hemisphere (Extended Data Fig. 1a). Animals were allowed to recover for 5 days 
before starting behavioural training. 

LED implant and stimulation. Once animals reached asymptotic performance 
on our task!®, they underwent a second surgery to implant a custom-built device 
for optogenetic stimulation (Extended Data Fig. 1c). The device consisted of a red 


light-emitting diode (LED) (\=615 nm, 110 mW output power, XLAMP XPC 
LED RD-ORANGE, Cree) on a printed circuit board, powered by two coin-cell 
batteries (CR2032). An infrared (IR)-sensitive photodiode was used to wirelessly 
control the LED. After device implantation and recovery, animals resumed behay- 
ioural training. An IR light-source placed on top of the training cage was activated 
to trigger the LED for the duration of the optogenetic stimulation (1s or 50 ms, 
continuous light). Stimulation trials were at least 10s apart to allow the batteries of 
the LED to recover. Between stimulation trials, rats performed a varying number 
of non-stimulated trials (range: 0-5), resulting in ~30% of the trials being ‘stimu- 
lated. Optogenetic stimulation was repeated for several sessions (5-12). Batteries 
were changed daily. 

Functional verification. To characterize the effects of optogenetic stimulation on 
motor cortex activity, acute electrophysiological recordings were done after ter- 
mination of the behavioural experiments in two of the rats. The animals were 
anaesthetized and placed in a stereotactic frame. The implanted LED device was 
carefully removed to expose the previous craniotomy. Using a custom-built record- 
ing setup and silicon probes (Buzsaki-64, Neuronexus), we recorded single-unit 
activity in motor cortex below the craniotomy. The removed LED device was placed 
next to the silicon probe above the craniotomy. Once stable units were detected, 
we triggered the LED and illuminated motor cortex for 1s (30 trials). Recordings 
were performed at multiple depths (0.1 mm to 2.5mm). Units were classified as 
light responsive if at least two consecutive bins of 5 ms during the first 200 ms of 
illumination had a significant z-score (compared to 1s of baseline before light 
onset). These included units with long onset latencies (>10 ms), consistent with 
indirect activation (Extended Data Fig. 1b). The relatively high number of light 
responsive units (69 + 3%), compared to the number of cells counted as infected by 
immunohistochemistry 31 + 2%; see below), is likely due to such indirect effects. 
Moreover, many of the recorded light-responsive cells were only identified during 
stimulation, further biasing our results to responsive cells. 

Histological verification. At the end of the experiments, animals were transcar- 
dially perfused with PBS and subsequently fixed with 4% paraformaldehyde 
(PFA) in PBS. Brains were removed and post-fixed for at least 24h. Brains were 
sliced coronally (thickness: 801m) and immunohistochemistry performed to 
determine the AAV injection site and extent of the transfection (Extended Data 
Fig. 1a). Slices were blocked (1% BSA, 0.3% Triton in PBS) at room temperature 
and incubated with anti-RFP (chicken, 1:1,000, Millipore, AB3528) and anti- NeuN 
(mouse, Millipore, MAB377) primary antibodies in blocking buffer for 48h at 
4°C. After washing, slices were incubated with anti-chicken-Alexa 568 (goat, 
1:1,000, Life Technologies, A-11041) and anti-mouse-Alexa 647 (goat, 1:1000, Life 
Technologies, A-31625) over night at 4°C. Slices were mounted and imaged using a 
Zeiss Axio Scan Z1 Slide Scanner for overview images and an Olympus FluoView 
FV 1000 confocal microscope for high-resolution images. We verified the targeting 
and spread of all injections based on the fluorescent signal and determined the 
extent of the AAV injections in a subset of animals (8.2 mm?+1.3mm*; n=2 
rats). In addition we chose 4 regions of interest (size 635 x 635 1m) and counted 
the number of infected cells relative to the number of neurons (NeuN* cell) to 
determine the fraction of infected cells (31 + 2%; n=2 rats). Histology was done 
blind to the outcome of the experiment. 

Data analysis for rat experiments. To assess the behavioural effects of the differ- 
ent injections in the acute inactivation experiments, we measured performance 
relative to ‘baseline’ training sessions after the craniotomies but before any injec- 
tions. To standardize analysis across experimental conditions (muscimol, PBS, or 
baseline), we only included data from the first 10 min of each session, matching 
the duration of the muscimol sessions. For the optogenetic stimulation experi- 
ments, ‘baseline’ was defined as the non-stimulated trials in the same sessions. 
The number of sessions for each condition ranged from 1-3 (injections) and 5-12 
(optogenetic stimulations). Data from training sessions of a given condition were 
pooled for each animal. To quantify behavioural performance, the fraction of trials 
with an inter-press interval (IPI) within 20% of the target IPI was calculated. Data 
on motor cortex lesioned animals presented for comparisons in Fig. le comes from 
previously published experiments!®, and includes sessions from the second week 
of post-lesion training. 

Zebra finch experiments. All birds were obtained from the Harvard University 
zebra finch breeding facility and housed on a 13:11h light/dark cycle in acoustic 
isolation with food and water provided ad libitum. 

Pharmacological lesions. Birds (n= 5) were anaesthetized with isoflurane. Nif was 
localized antidromically by electrical stimulation in HVC”. Bilateral Nif lesions 
were made by injecting the excitotoxin N-methyl-p1-aspartic acid (NMA, 4%) 
into each hemisphere using a Nanoject (WPI). In initial experiments, a single 27 nl 
bolus of NMA was injected into the centre of Nif. Though this volume produced 
complete bilateral Nif lesions in one animal, we found that complete lesions were 
more reliably produced by injecting two boluses of 18 nl (for a total of 36 nl) 200 1m 
apart along the anterior-posterior axis. We report on the five animals that had 100% 
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bilateral Nif lesions, determined by post-hoc histological inspection (see below, 
Extended Data Fig. 3b). One of these received 27 nl and 4 received 36 nl injections. 
Reversible inactivations. Birds (n= 5) were anaesthetized and Nif identified as 
described above. Craniotomies over Nif were covered with artificial dura (Body 
Double Fast; Smooth-On, Inc.) and head screws were attached to the skull with 
dental cement as previously described‘. Following post-surgical recovery, awake 
birds were placed in a foam restraint and head-fixed to a stereotax for ~10 min each 
morning for 10-14 days to desensitize them to handling and restraint. Following 
the desensitization training, all birds reliably sang within 30 min of the restraint. 
In the morning of experimental days, muscimol (27 nl, 50 mM) or PBS (27 nl) was 
injected bilaterally as described in the text. In two birds that routinely sang within 
10min of drug administration, we also injected a smaller dose (9 nl) of muscimol 
into Nif to additionally verify that song degradation was due to direct inactivation 
of Nif (Extended Data Fig. 4b). 

We note that a previous study aimed at reversibly inactivating Nif in adult song- 
birds failed to show any obvious effect on song structure*, but conflicting results 
from experiments in juvenile birds and methodological uncertainties regarding 
drug injection volumes in adult birds make its conclusions tentative. This previous 
report notwithstanding, all our muscimol injections into Nif produced similarly 
severe song degradation (Fig. 2f and Extended Data Fig. 4b). 

Implantation of recording and stimulation device. Zebra finches (n= 11) were anaes- 
thetized with isoflurane and placed in a stereotax. HVC was identified by antidro- 
mic stimulation from Area X as previously described”’. Nif was similarly identified 
by stimulating in HVC. For birds targeted for electrolytic Nif lesions, we placed 
either a monopolar stimulating electrode at the dorsal-posterior edge of Nif (n=8 
birds) or a bipolar stimulating electrode straddling Nif in the medial-lateral plane 
(n=3 birds). A custom recording array (3 channels; 100k{)) was implanted in the 
hemisphere ipsilateral to the Nif-stimulating electrode and within the identified 
boundaries of HVC as previously described”. All birds exhibited normal song 
output within 7 days of surgery. Following completion of the experiment, animals 
were euthanized, their brains collected, and the placement of recording electrodes 
and extent of lesions confirmed histologically. 

Neural and behavioural recordings. Sound and neural activity were recorded using 
acustom LabVIEW application (National Instruments) as previously described”. 
Multi-unit neural activity was recorded from up to three sites in HVC (~250,1m 
spacing) for three to four weeks per bird. Because stability of the neural recordings 
is crucial for estimating recovery in HVC dynamics, analysis was done on data 
collected at the most stable recording site in each bird (determined pre-lesion), 
though we note that the trends were similar across all channels. 

Electrolytic lesions. Electrolytic lesions of Nif were made in the right hemisphere by 
passing 501A of monophasic current through the stimulating electrode for 30-40s. 
Current injections started while birds were singing, and in all cases immediately 
terminated song output. Lesion extent was estimated post-hoc as described below. 
Histological verification of lesions and inactivations. At the end of the experiments, 
birds were anaesthetized with natriumpentobarbital (Nembutal, IM) and trans- 
cardially perfused with PBS, followed by fixation with 4% PFA in PBS. Brains 
were removed and post-fixed in 4% PFA overnight. Parasagittal sections (75m) 
were cut on a Vibratome (Leica), mounted, and stained with cresyl violet to recon- 
struct the location of implanted electrodes and lesions (ImageJ). Identification of 
the injection sites for the muscimol inactivations and circuit tracings were done 
in alternate brain slices by fluorescence microscopy (Fig. 2e and Extended Data 
Fig. 4a). Histology was done blind to the identity of the animals. 

Nif was identified based on regions of stronger staining and higher cell density 
than surrounding areas and were additionally guided by proximate anatomical 
landmarks (for example, HVC, the lamina mesopallialis and the lamina pallio- 
subpallialis). 

Lesions. Location and size of the lesions were determined by estimating the extent 
of necrotic tissue (that is, loss of neurons and gliosis) in photomicrographs of cresyl 
violet stained sections as previously described”. Lesion size was expressed as a per- 
centage of estimated Nif size, measured in intact controls (0.035 + 0.001 mm?,n=4 
birds). In pharmacologically lesioned birds, 100% of Nif was lesioned (Extended 
Data Fig. 3b). In electrolytically lesioned birds, 0-100% of Nif was lesioned (Fig. 3)). 
Inactivations. Fluorescent dye-conjugated dextrans (0.5 mg ml! Alexa 594; 
Invitrogen) were co-injected with the final injection of muscimol for post-hoc 
verification of the injection site (Fig. 2e and Extended Data Fig. 4a). Fluorescence 
images of the sections were superimposed on those of their adjacent cresyl violet 
sections (Adobe Photoshop) to determine locations of fluorescence in relation 
to Nif. All injection sites were found to be within the target nucleus (Extended 
Data Fig. 4a). 

Neural circuit tracing. To visualize Nif (Fig. 2e and Extended Data Fig. 3a), fluores- 
cent dye-conjugated cholera toxin subunit B (1 mg ml™!, Alexa 488; Invitrogen) 
was injected into HVC in 2 birds (83 nl per hemisphere). Twenty-one days after 
surgery, the animals were euthanized, perfused, and their brains fixed, sectioned, 


ARTICLE 


and mounted. Photomicrographs of fluorescent sections were overlaid on those 
of adjacent cresyl violet sections (Adobe Photoshop) to determine the location 
of fluorescence in relation to anatomical landmarks and density of cell bodies. 
Data analysis of song. Syllable segmentation and annotation. Raw audio record- 
ings were segmented into syllables as previously described”. Spectrograms were 
calculated for all prospective syllables, and a neural network (5,000 input layer, 
100 hidden layer, 3-10 output layer neurons) was trained to identify syllable types 
using a test data set created manually by visual inspection of song spectrograms. 
Accuracy of the automated annotation was verified by visual inspection of a subset 
of syllable spectrograms. 

Syllable feature quantification. All non-call vocalizations were characterized by 
their duration and mean Wiener entropy—both robust acoustic features that 
are tightly controlled in adult zebra finch song”®. Syllable durations were esti- 
mated from threshold crossings of the acoustic power as previously described”’. 
Wiener entropy, a measure of acoustic randomness, was calculated using Sound 
Analysis for MATLAB’ for 10 ms time windows, advancing in steps of 1 ms, 
such that entropy was computed for every millisecond. The entropy measure- 
ments were averaged across the syllable and log-transformed. On this scale, the 
Wiener entropies of white noise and of a pure tone are zero and minus infinity, 
respectively. 

Duration probability distributions. Histograms (1.25 ms bins) of syllable durations 
produced within 1 h of muscimol/PBS injections were generated for each experi- 
ment, normalized by total sample counts, averaged across 2-4 experiments within 
a bird, and then averaged across birds. Data from HVC-lesioned birds, provided by 
the authors of ref. 26, was recorded on the first day of singing after lesion (2-7 days 
after surgery) and analysed similarly. Mean duration distributions for all conditions 
were smoothed with a sliding boxcar window (7-bin width, 1-bin advance). 
Entropy-duration joint probability distributions. Two-dimensional histograms, 
showing the joint distributions of syllable duration and Wiener entropy, were 
created with bins of width 1.25 ms (duration axis; range: 0-300 ms) and 0.025 
(log Wiener entropy axis; range: —4—0). The histogram was normalized by total 
sample counts to construct an empirical probability distribution. Because these 
empirical distributions were sparsely sampled, we estimated the true probability 
distribution by smoothing the empirical distribution with a point-spread func- 
tion (2D Gaussian; width: 7 bins; sigma: 3 bins). Distributions were calculated 
for vocalizations produced during the following time windows. Bilateral lesion 
experiments: the first 2h of singing each day; inactivation experiments: 2h before 
(pre), 1h after (post), and 6-8h after (washout) injection; unilateral lesion: the 
first 2h of singing each day, the first hour of post-lesion singing, and the last 4h of 
singing on the day of lesion. 

Distribution similarity measurement. To quantify changes in song elements, we 
calculated the first Wasserstein distance, a common metric of the difference in 
probability distributions, between syllable entropy-duration distributions for songs 
produced at different time points or under various experimental conditions (see 
text). We used an implementation in MATLAB and C available at (http://www. 
ariel.ac.il/sites/ofirpele/FastEMD/). Distances between bins were Euclidean. 
Calculations were based on 50,000 samples drawn from the entropy-duration 
probability distributions and reported in figures as the mean distance per sample. 
Motif completion rate. For each bird, a 3-5 syllable dominant song motif was iden- 
tified by visual inspection of spectrograms. Motif completion rates (MCR) were 
calculated as: 


MCR= Number of utterances of complete motifs 


Number of utterances of the first syllable in the motif 


For all birds, motif completion rates were calculated for the first two hours of sing- 
ing per day; for unilaterally lesioned birds, rates were also calculated for the first 
hour of singing following lesion. ‘Intact’ motif completion rates (Fig. 4f) were based 
on a subset of the lesioned birds (four from the ‘bilateral’; three from ‘unilateral’ 
group) but collected 1-2 weeks before the Nif lesions. Data from each bird was 
normalized to pre-lesion motif completion rates for comparison across animals. 
See Extended Data Fig. 5 for examples of truncated motifs. 

Motif duration stretch. The durations of the dominant song motifs were calcu- 
lated as previously described”? for interval durations. For all birds, the mean motif 
duration was calculated for 100 consecutive renditions, taken at the same time 
each day (~1h after lights on in the morning). For unilaterally lesioned birds, the 
mean duration was also calculated for the first 100 identifiable motifs produced 
immediately after lesion. As noted above, ‘intact’ data were collected from birds 
that were later lesioned. Motif durations were normalized to pre-lesion values for 
comparison across animals. See Extended Data Fig. 5 for examples of aligned and 
excluded vocalizations. 

Data analysis of neural recordings. Spontaneous activity. To record spontaneous 
HVC activity, minute long recordings were made every 15 min. These recordings 
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were bandpass filtered (1-5 kHz; 2-pole Butterworth; zero-phase) and segments 
within 500 ms of vocalization-related activity were marked for exclusion from sub- 
sequent analysis. Individual spikes were detected by an amplitude threshold set to 
3-8 standard deviations of the estimated noise in the recordings. For each bird, 
the spontaneous firing rates were normalized to the mean firing rate in the two 
hours before lesion. Shown in Fig. 3c is the across-bird mean and standard error, 
smoothed with a sliding boxcar window (5 bin width, 1 bin advance). 
Alignment of the neural recordings to song. A dynamic time warping (DTW) algo- 
rithm was used to align individual song motifs to a common template as previously 
described”’. The warping path derived from this alignment was then applied to 
the corresponding HVC recordings with a premotor lead of 35 ms”’. The aligned 
neural traces were squared (to calculate signal power) and smoothed (5 ms boxcar 
window, 1 ms advance). 
HVC activity correlation. The recovery of temporal dynamics in HVC was calcu- 
lated as the Pearson's correlation between the song-aligned neural power imme- 
diately before lesion and the same at different times after lesion. The running 
correlation in Fig. 3f shows Pearson's correlation between the mean song-aligned 
activity pattern of pre-lesion songs on the day of lesion and the mean activity 
patterns in a sliding window of 25 song motifs. The pre-lesion data point in 
Fig. 3g represents the correlation between the mean power envelopes for two con- 
secutive blocks of 25 motifs recorded immediately before lesion. Normal drift in 
the song-related HVC signal (‘control’) was calculated similarly. 
HVC mean power. The mean HVC power was calculated per motif and averaged 
over the 25-motif windows as described above for the correlation. For analyses 
pooled across birds, mean HVC power was normalized to the pre-lesion value. 
Day versus night recovery. Recovery of HVC activity in the first 60h following 
lesion, during which most of the post-lesion recovery occurred, was parsed into 
3 day-time and 2 night-time intervals. Daytime recovery was calculated as the 
change in correlation to pre-lesion activity (or normalized mean power) between 
the first 25 motifs in the morning (or immediately following lesion) and the last 
25 motifs that evening; night-time recovery is the change between the last 25 motifs 
of the day and the first 25 of the subsequent morning. 
Modelling. Network architecture. On the basis of previous experimental 
findings***4, we modelled the HVC network as a synfire-chain of bursting neu- 
rons. The model consisted of 1,200 integrate-and-burst neurons organized into 
80 nodes. Each of the 15 neurons in a node projected to all neurons in the next 
node, forming a chain topology. 

The subthreshold membrane potential of the i 


neuron, V;, obeys: 


Cot =~ g, (Vi= Vi) t dyna t Ing t JmOnit) 


where C= 1 :F/cm? is the membrane capacitance, g,=0.1 mS/cm? is the leak con- 
ductance, V;= —60 mV is the leak potential, I,,,,; is the synaptic input, Ini; repre- 
sents external input to the HVC neurons from Nif, 7;(t) is a zero-mean Gaussian 
white noise with covariance (7(t)n;(t’)) =6(t—t’), T,= 10 ms and o = 200 nA/cm’. 
The synaptic input is given by I,,,,(t) = Wr MiXy, het e(t— an where rt 
denotes the k'” spike of j" neuron, e(f) = O(t)e~'/* with Q(t) being the step 
function and 7;=5ms, W=87nA/cm? and Mis 1 for synapses from a neuron j 
to the neurons i in the next node and 0 otherwise. The Nif input is a different 
waveform for each HVC neuron and does not change across simulations. The 
waveforms were randomly generated by simulating an Ornstein-Uhlenbeck process 
with an autocorrelation time scale of 50 ms, starting from a random initial point. 
Noise and drift were chosen such that the resulting waveforms had a mean of 
97 nA/cm? and standard deviation of 53 nA/cm?. When the membrane potential 
of the integrate-and-burst neuron reaches threshold, V;,= —50 mV, the neuron 
emits 4 spikes with 2 ms intervals, modelling the bursts generated by calcium spikes 
in RA-projecting HVC neurons*”*®, and the membrane potential is reset to 
Va=—55 mV after a refractory period of 4ms. Chain propagation was started by 
a 5ms pulse input with magnitude 6.7 A/cm” to the neurons in the first node. The 
parameters of the model were chosen to approximate the results of our experi- 
ments. Some of these parameters were subject to change as explained below. 
Homeostatic regulation of neural activity. We implemented three different homeo- 
static plasticity rules, each of which can adaptively modify the excitability of HVC 
neurons. 

Rule 1: if during a simulated chain propagation a neuron did not spike, its spik- 
ing threshold decreased by 1|1V. If the neuron produced more than 8 spikes or 
2 bursts, the threshold increased by 1 1V. This rule is used in Fig. 4 and Extended 
Data Fig. 6a. Such homeostatic changes in spiking thresholds have been observed 
in experiments”. 

Rule 2: if during a simulated chain propagation a neuron did not spike, the 
leak conductance of the neuron decreased by 0.1 1S/cm”. If the neuron produced 


more than 8 spikes or 2 bursts, the leak conductance increased by 0.1 ,1S/cm?. 
This rule amounts to changing the neuron’s input resistance, defined as the change 
in membrane potential in response to injected current, divided by the current. This 
rule is used in Extended Data Fig. 6b. Homeostatic changes to input resistance have 
also been observed in experiments*®. 

Rule 3: if during a simulated chain propagation a neuron did not spike, 
all synaptic weights to that neuron increased by 6.7 pA/cm”. If the neuron 
produced more than 8 spikes or 2 bursts, the synaptic weights decreased by 
6.7 pA/cm?. This rule is used in Extended Data Fig. 6c. Activity-dependent 
homeostatic changes to a neuron’s synaptic inputs have been observed in exper- 
iments, for example, in cortical neurons”. 

In Fig. 4 and Extended Data Fig. 6, a ‘motif’ was considered complete if at 

least one neuron in each of the 80 nodes produced a spike. Motif duration was 
calculated as the time from the propagation initiation until the average spike time 
of the neurons in the last node. We ran simulations with modified parameters to 
verify that our results presented in Fig. 4 were qualitatively robust. 
Statistical analysis. All statistics on data pooled across animals is reported in the 
main text as mean + s.d. and depicted in figure error bars as mean + s.e.m. Where 
appropriate, distributions passed tests for normality (Kolmogorov-Smirnov), equal 
variance (Levene), and/or sphericity (Mauchly), unless otherwise noted. Multiple 
comparison corrected tests were used where justified. Statistical tests for specific 
experiments were performed as described below. 

Fig. le. Comparison of fraction of trials with IPIs within 20% of the target for 
different experimental treatments (n =5 rats). Mauchly’s test indicated a viola- 
tion of sphericity (W= 0.134, P= 0.049), and a Huynh-Feldt degrees of freedom 
correction was applied. Subsequent repeated-measures ANOVA revealed signif- 
icant differences between the treatments (F(1.17,4.59) = 35.7, P= 0.002). Post-hoc 
comparisons using Dunnett's test showed significant differences between PBS 
(control) and muscimol injections (P= 0.0002), but not between PBS and base- 
line (P=0.99). 

Fig. 1h. Effect of optogenetic stimulation of motor cortex on task performance. 
A two-tailed, paired t-test revealed significant differences in performance in the 
light off and light on conditions (1=5 rats; P=3 x 10~°). 

Fig. 2c. Comparison of Wasserstein distances between joint entropy-duration 
distributions before and after bilateral Nif lesions (n =5 birds). Repeated-measures 
ANOVA showed no significant difference on any day (F(3,12)= 2.21, P=0.14). 

Fig. 2g. Same as Fig. 2c, but comparing pre-injection songs to songs after musci- 
mol/PBS injections (n =5 birds). Mauchly’s test indicated a violation of sphericity 
(W=1.9x 10-4, P=0.014), anda Huynh-Feldt degree of freedom correction was 
applied. Subsequent repeated-measures ANOVA revealed significant differences 
between the treatments (F(1.36,5.43)= 19.7, P= 0.004). Post-hoc comparisons using 
Dunnett's test showed significant differences between PBS (control) and musci- 
mol injections (P= 1 x 10~*); no other condition significantly differed from PBS 
(P>0.92). 

Fig. 2h. Comparison of the syllable duration distributions following HVC 
lesions (n=5 birds) and Nif inactivations (n= 5 birds). A Kolmogorov-Smirnov 
test on the mean distribution across animals showed no significant differences 
(P=0.24). 

Fig. 3e. Comparison of Wasserstein distances between joint entropy-duration 
distributions before and after unilateral lesions to Nif (n = 4 birds). Repeated- 
measures ANOVA revealed that lesions produced significant differences in song 
structure (F(5,15)= 17.7, P=8 x 10~°). Post-hoc comparisons using Dunnett’s test 
showed significant differences from baseline until the second day after lesion (post 
and 8h: P< 0.001; 1 day: P=0.002; P > 0.05 thereafter). 

Fig. 3g. Comparisons of HVC dynamics in intact controls and following Nif 
lesions. A two-tailed, paired t-test revealed significant differences in correlation 
immediately before and after lesion (n = 4 birds; P= 0.003). In addition, two-tailed 
unpaired t-tests showed significant differences between lesion and control condi- 
tions at matched time points until the third day post-lesion (P< 0.03 before, P=0.1 
at the start of the third day). 

Fig. 3h. Comparisons of normalized HVC activity in intact controls and follow- 
ing Nif lesion. A two-tailed, paired t-test revealed significant differences in activity 
immediately before and after lesion (n = 4 birds; P= 0.002). In addition, two-tailed 
unpaired t-tests showed significant differences between lesion and control con- 
ditions at matched time points until the third day post-lesion (P< 0.03 before, 
P=0.29 at the end of the third day). 

Fig. 3i. Top, comparison of recovery of correlation to pre-lesion HVC dynamics 
during day and night (n= 4 birds). Two-tailed one-sample t-tests revealed signif- 
icant recovery overnight but not during the day (test against mean zero; P=0.01 
and P= 0.053, respectively). Bottom, comparison of recovery of HVC activity to 
pre-lesion levels during day and night (n= 4 birds). Two-tailed one-sample t-tests 
revealed significant recovery during the day but not overnight (test against mean 
zero; P=0.007 and P= 0.48, respectively). 
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Fig. 3j. Top, correlation to pre-lesion HVC dynamics immediately follow- 
ing Nif lesions as a function of the fraction of Nif lesioned (n= 11 birds). 
A two-tailed t-test revealed the Pearson’s linear correlation coefficient, R= —0.91, 
to be significantly different from zero (P=1 x 10-4). Bottom, normalized 
HVC activity immediately following Nif lesions as a function of the fraction 
of Nif lesioned (n = 11 birds). A two-tailed t-test revealed the Pearson’s linear 
correlation coefficient, R= —0.87, to be significantly different from zero 
(P=5x 10‘). 

Fig. 4f. Top, comparison of post-lesion motif completion rates to pre-lesion 
baseline for unilateral (n = 4 birds) and bilateral (n =5 birds) Nif lesions. Repeated 
measures ANOVA revealed that lesions resulted in significant reductions of com- 
pletion rates in unilateral (F(g,j6)=7.0, P=5 x 10~‘), but not bilateral (Fo,1s)= 4.1, 
P=0.07), lesions. Post-hoc analysis of the unilateral lesion data using Dunnett’s 
test showed motif completion rates to be significantly different from pre-lesion 
on the day of lesion (P=5 x 10~“), but not thereafter (P > 0.11). Bottom, com- 
parison of post-lesion motif tempo to pre-lesion baseline for birds with unilateral 
(n=4 birds) and bilateral (n =5 birds) Nif lesions. Repeated measures ANOVA 
revealed that Nif lesions had a significant effect on motif tempo in both unilateral 
(F(g,16)= 10.4, P=4.6 x 10°) and bilateral (F(6,1s)= 17.5, P=1.3 x 10°) condi- 
tions. Post-hoc analysis using Dunnett's test showed motif tempo was slowed down 
for both unilateral and bilateral lesions: the effects remained significant (P < 0.05) 
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throughout the 7 days in the bilaterally lesioned birds, and through the first 4 days 
in unilaterally lesioned birds. 
Code availability. All custom-written code will be made available upon request. 
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Extended Data Figure 1 | Light stimulation of motor cortical neurons 28 single units recorded in an anaesthetized rat in response to a 1 s light 
expressing the optogenetic activator Chrimson. a, Representative pulse, averaged over 30 stimulations (Methods). c, A custom-built battery- 
example of AAV-injections into motor cortex, showing Chrimson- operated wireless optogenetic stimulation device, consisting of a printed 
tdTomato expression at different magnifications in a coronal brain circuit board with integrated IR sensor and LED (\=615nm). The IR 
section (~1.5 mm anterior to bregma). The scheme of the brain (right) is sensor gates the circuit and allows the LED to be triggered by an IR light- 


adapted from Paxinos’ rat atlas. The estimated spread of the injections was __ source placed on top of the rat’s cage. During surgery, the LED is affixed 
8.3+1.3mm? (mean +s.d., 1 =2 rats), with an average of 31+2% infected atop a small craniotomy above motor cortex. 
cells (Methods). b, Heatmap showing the instantaneous firing rates of 
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Extended Data Figure 2 | Both brief and sustained optogenetic 
stimulation of motor cortex cause significant performance deficits in our 
task. a, Optogenetic stimulation was triggered on the first lever press in a 
trial, and lasted for either 50 ms or 1s. b, Both sustained (1s, left, compare 
Fig. 1h, n=5 rats, P=3 x 10>, paired t-test) and brief (50 ms, right, n =3 


rats, P=0.01, paired t-test) optogenetic activation of motor cortex interfered 
with normal task performance. c, Comparing the effects of the two 
stimulation protocols on task performance (ratio light on/light off) shows 
that sustained stimulation has a significantly larger effect (1s: n= 5; 50 ms: 
n= 3, P=0.004, unpaired t-test). Error bars represent s.e.m. 
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Extended Data Figure 3 | Localization and lesioning of Nif. a, Injection lesions of Nif. Shown are Nissl stained sections from both hemispheres in 
of fluorescently labelled cholera toxin subunit B (green) into HVC the same example bird. Red arrows indicate the estimated boundaries of 
retrogradely labels Nif and anterogradely labels downstream control Nif; dashed green line shows the extent of the lesion. 

nucleus RA. b, Bilateral injections of the excitotoxin NMA produced focal 
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Extended Data Figure 4 | Muscimol injections into Nif. a, Top, Nissl- 
stained parasagittal section of a zebra finch brain. Middle, magnified 

view of the region demarcated with a green square atop. Red arrows (left) 
indicate the estimated boundaries of Nif; violet overlay (right) shows the 
spread of fluorescent dye co-injected with muscimol. Orange star indicates 
estimated centre of injection based on brightness of the fluorescence. 
Bottom, estimated injection sites relative to the boundaries of Nif for all 
muscimol-injected birds. Colours denote different animals. b, Syllable 
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spectrograms (left) and entropy-duration distributions (right) for a bird 
injected with different volumes of muscimol in Nif. Example spectrograms 
for 9 nl and 27 nl injections are from recordings made 3 min and 7 min 
after the injections, respectively. That song disruption was similarly rapid 
and severe for both volumes (in conjunction with the lack of effect from 
injections above Nif) limits the possibility that the effects on song were 
due to diffusion of the drug into HVC. 
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Extended Data Figure 5 | Spectrograms of vocalizations following reliably identified and thus were excluded from subsequent analysis. 
unilateral Nif lesion. Data for the example bird in Fig. 3. All examples Middle, example spectrograms of identifiable motifs that were included in 
were recorded within the first hour of singing after lesion. Top, example the alignment-dependent analysis (Fig. 3f-j). Right, example spectrograms 
spectrogram of a motif recorded just before lesion. Left, example of songs with identifiable syllables, but truncated motifs. 


spectrograms of vocalizations in which motif syllables could not be 
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Roots and leaves of healthy plants host taxonomically structured bacterial assemblies, and members of these communities 
contribute to plant growth and health. We established Arabidopsis leaf- and root-derived microbiota culture collections 
representing the majority of bacterial species that are reproducibly detectable by culture-independent community 
sequencing. We found an extensive taxonomic overlap between the leaf and root microbiota. Genome drafts of 400 
isolates revealed a large overlap of genome-encoded functional capabilities between leaf- and root-derived bacteria with 
few significant differences at the level of individual functional categories. Using defined bacterial communities and a 
gnotobiotic Arabidopsis plant system we show that the isolates form assemblies resembling natural microbiota on their 
cognate host organs, but are also capable of ectopic leaf or root colonization. While this raises the possibility of reciprocal 
relocation between root and leaf microbiota members, genome information and recolonization experiments also provide 
evidence for microbiota specialization to their respective niche. 


Plants and animals harbour abundant and diverse bacterial micro- 
biota’. These taxonomically structured bacterial communities have 
important functions for the health of their multicellular eukaryotic 
hosts“. The leaf and root microbiota of flowering plants have been 
extensively studied by culture-independent analyses, which have 
consistently revealed the co-occurrence of four main bacterial phyla: 
Actinobacteria, Bacteroidetes, Firmicutes and Proteobacteria’). 
Determinants of microbiota composition at lower taxonomic ranks, 
that is, at genus and species level, are host compartment, environmental 
factors and host genotype®”!'®, 

Soil harbours an extraordinary rich diversity of bacteria and these 
define the start inoculum of the Arabidopsis thaliana root micro- 
biota®’. The inoculum source of the leaf microbiota is thought to be 
more variable owing to the inherently open nature of the leaf ecosys- 
tem, probably involving bacteria transmitted by aerosols, insects, or 
soil®?!”, A recent study of the grapevine (Vitis vinifera) microbiota 
showed that the root-associated bacterial assemblies differed sig- 
nificantly from aboveground communities, but that microbiota of 
leaves, flowers, and grapes shared a greater proportion of taxa with 
soil communities than with each other, suggesting that soil may serve 
as a common bacterial reservoir for belowground and aboveground 
plant microbiota'®. 

A major limitation of current plant microbiota research is the lack 
of systematic microbiota culture collections that can be employed in 
microbiota reconstitution experiments with germ-free plants to address 
principles underlying community assembly and proposed microbiota 


functions for plant health under laboratory conditions’®. 


Bacterial culture collections from roots and leaves 

We employed three bacterial isolation procedures to establish taxo- 
nomically diverse culture collections of the A. thaliana root and leaf 
microbiota. Bacterial isolates were recovered from pooled or individual 


roots or leaves of healthy plants using colony picking from agar 
plates, limiting dilution in liquid media in 96-well microtitre plates, 
or microbial cell sorting (see Methods). We adopted a two-step bar- 
coded pyrosequencing protocol” for taxonomic classification of the 
cultured bacteria by determining >550 base pairs (bp) 16S ribosomal 
RNA (rRNA) gene sequences (Supplementary Fig. 1; Methods). In 
parallel, parts of the root and leaf material was used for cultivation- 
independent 16S rRNA gene community sequencing to cross-reference 
Operational Taxonomic Unit (OTU)-defined taxa from the microbiota 
with individual colony forming units (CFUs) in the culture collections. 

A total of 5,812 CFUs were recovered from 59 independently 
pooled A. thaliana root samples of plants mainly grown in Cologne 
soil, Germany, whereas 2,131 CFUs were retrieved from leaf washes 
of individual leaves collected from A. thaliana populations at six loca- 
tions near Tubingen, Germany, or Zurich, Switzerland (Supplementary 
Data 1). Recovery estimates for root-associated OTUs were calculated 
using the culture-independent community profiles of the present 
and two earlier studies*!” and varied for the top 100 OTUs (70% of 
sequencing reads) between 54-65% and at >0.1% relative abun- 
dance (RA) between 52-64% (Methods; Extended Data Fig. la-c; 
Supplementary Data 2). For leaf samples, the culture-independent 
16S rRNA gene analyses from individual and pooled leaves (60 sam- 
ples from six sites) revealed similar community profiles at all tested 
geographic sites and high leaf-to-leaf consistency (Extended Data 
Fig. 2). Recovery estimates of the top 100 leaf-associated bacterial 
OTUs (86% of all sequencing reads) were 54% and at >0.1% RA 47% 
(Extended Data Fig. 1d). The root-derived CFUs correspond to 23 of 
38 and the leaf-derived CFUs belong to 28 of 45 detectable bacterial 
families. Root- and leaf-derived CFUs each represent all four bacterial 
phyla typically associated with A. thaliana roots and leaves. Thus, most 
bacterial families that are reproducibly associated with A. thaliana 
roots and leaves have culturable members. 
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Figure 1 | Taxonomic overlap between At-RSPHERE and At-LSPHERE 
isolates and their representation in culture-independent microbiota 
profiling studies. a, b, Phylogenetic trees of At-RSPHERE (a; n = 206 
isolates) and At-LSPHERE (b; n = 224 isolates) bacteria. Their taxonomic 
overlap is shown in the outermost ring (green or brown triangles). 

a, Representation of At-RSPHERE bacteria in each of four indicated 
culture-independent profiling studies of the A. thaliana root microbiota; 


At-RSPHERE and At-LSPHERE culture collections 

We selected from the aforementioned culture collections a taxonomi- 
cally representative core set of bacterial strains after Sanger sequencing 
of a >550bp fragment of the 16S rRNA gene and additional strain puri- 
fication (Methods). To increase the intra-species genetic diversity of the 
culture collections, and because the quantitative contribution ofa single 
isolate to its corresponding OTU cannot be estimated, we included bac- 
terial strains sharing >97% 16S rRNA gene sequence identity (widely 
used for bacterial species definition), but representing independent 
host colonization events, that is, recovered from different plant roots 
or leaves. In total we selected 206 root-derived isolates that comprise 
28 bacterial families belonging to four phyla (designated At-RSPHERE) 
and 224 leaf-derived isolates that comprise 29 bacterial families belong- 
ing to five phyla (designated At-LSPHERE) (Extended Data Fig. 3a, b; 
Supplementary Data 1; Methods). Additionally, to represent abundant 
soil OTUs (>0.1% RA) we selected 33 bacterial isolates encompass- 
ing eight bacterial families belonging to three phyla from unplanted 
Cologne soil (Extended Data Fig. 3c). 

Notably, the majority of the At-RSPHERE isolates share >97% 
16S rRNA gene sequence identity matches with root-associated 
OTUs reported in four independent studies in which A. thaliana 
plants had been grown in Cologne soil®!” or other European®”” 
or US soils’ (inner four circles in Fig. 1a; Methods). Similarly, the 
bulk of At-LSPHERE isolates match leaf-derived OTUs detected in 
A. thaliana populations at the Tiibingen/Zurich locations or US-grown 
plants (innermost two circles in Fig. 1b). This indicates that repre- 
sentatives of the majority of At‘RSPHERE and At-LSPHERE members 
co-populate the corresponding A. thaliana organs in multiple tested 
environments, including two continents, Europe and North America. 

Phylogenetic analysis based on 16S rRNA gene Sanger sequences 
revealed that 119 out of 206 At-RSPHERE isolates (58%) share >97% 
sequence identity matches with corresponding 16S rRNA gene 


root-associated OTUs with RAs >0.1% (dark orange) or <0.1% (light 
orange). b, Representation of At-LSPHERE bacteria in the two indicated 
culture-independent phyllosphere profiling studies; leaf-associated 
OTUs with RAs >0.1% (dark green) or <0.1% (light green). Taxonomic 
assignment and phylogenetic tree inference were based on partial 16S 
rRNA gene Sanger sequences. 


fragments of At-LSPHERE members (outermost circle in Fig. 1a). 
Similarly, 108 out of 224 At-LSPHERE isolates (48%) share >97% 
sequence identity matches with At-RSPHERE members (outermost 
circle in Fig. 1b). This extensive overlap both at the rank of bacterial 
genera and bacterial families (20 out of 38 detectable families) between 
leaf- and root-derived bacteria is notable because we collected leaf and 
root specimen from environments that are geographically widely sep- 
arated (>500 km) and is consistent with a previous report on leaf and 
root microbiota overlap in V. vinifera'’. This overlap is corroborated 
by the corresponding culture-independent leaf and root community 
profiles (Extended Data Fig. 4). As essentially all A. thaliana root- 
associated bacteria are recruited from the surrounding soil biome®”?, 
this raises the possibility that unplanted soil also defines the start inocu- 
lum for a substantial proportion of the leaf microbiota with subsequent 
selection for niche-adapted organisms. 


Comparative genome analysis of the culture collections 
To characterize the functional capabilities of the core culture collections 
we subjected each isolate to whole-genome sequencing and generated 
a total of 432 high-quality draft genomes (206 from leaf, 194 from root 
and 32 from soil; Supplementary Data 3). Taxonomic assignment of the 
whole-genome sequences confirmed that these isolates span a broad 
taxonomic range, belonging to 35 different bacterial families distrib- 
uted across five phyla (Supplementary Data 4). 

Based on the whole-genome taxonomic information, we grouped the 
isolates into family-level clusters. We found that clusters of genomes 
are characterized by a relatively large core-genome, with an average 
of 33.6% of the annotated proteins present in each member and a 
smaller fraction of singleton genes identified in only one genome per 
cluster (14.0%). Detailed analysis of phylogenetic diversity of each clus- 
ter revealed a substantial overlap between leaf, root and soil isolates 
(Supplementary Data 5). Many clusters showed no clear separation of 
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Figure 2 | Analysis of functional diversity between sequenced isolates. 

a, Principal coordinate analysis (PCoA) plot depicting functional distances 
between sequenced genomes (n= 432) based on the KEGG Orthology 
(KO) database annotation. Each point represents a genome. Colours 
represent the organ of isolation and shapes correspond to their taxonomy. 
Numbers inside the plot refer to bacterial families listed in b. b, Analysis 
of functional diversity within bacterial families as measured by pair-wise 


isolates based on their ecological niche, suggesting shared core func- 
tions. However, other clusters contained isolates of one organ or showed 
clear separation among them, suggesting niche specialization within 
some clusters (Supplementary Data 5). We then examined the func- 
tional diversity between the sequenced isolates in order to determine 
whether the observed phylogenetic overlap corresponded with func- 
tional similarities between leaf and root isolates. Principal coordinates 
analysis (PCoA) of functional distances (Fig. 2a; Methods) revealed a 
clear clustering of genomes on the basis of their taxonomy, but only 
limited separation of genomes on the basis of their ecological compart- 
ment. Taken together, both phylogenetic and functional diversification 
of the genomes is strongly driven by their taxonomic affiliation and 
weakly by the ecological niche. 

We examined the functional diversity within each bacterial family 
(Fig. 2b) in order to identify bacterial taxa with varying degrees of 
functional versatility. Families belonging to Actinobacteria show a 
lower functional diversity (average distance 0.37) compared to those 
belonging to Bacteroidetes, Firmicutes and especially Proteobacteria 
(0.65 average pair-wise distance), which exhibit a higher degree of 
within-family functional diversification, even though all family- 
level groups have a comparable degree of phylogenetic relatedness. 
Among these groups, Pseudomonadaceae, Oxalobacteraceae and 
Methylobacteriaceae members show the highest functional heteroge- 
neity, compared to Microbacteriaceae strains, which we identified as 
the least functionally diverse family (Fig. 2b). 

We searched for signatures of niche specialization at individual func- 
tional categories using enrichment analysis to identify functional cat- 
egories over-represented in genomes from root and leaf or soil isolates 
(Fig. 3; Methods). Specifically, we found the category ‘carbohydrate 
metabolism’ to be enriched in the leaf and soil genomes compared 
to those isolated from roots (Mann-Whitney test, P= 1.29 x 107’; 
Fig. 3b). We speculate that this differential enrichment could reflect 
the availability of simple carbon sources in roots through the process 
of root exudation (sugars, amino acids, aliphatic acids)”!?, whereas 
bacteria associated with leaves or unplanted soil might rely on a more 
diverse repertoire of carbohydrate metabolism genes to access scarce 
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Pair-wise functional distance 


functional distances between genomes (bottom panel; n = 432). Higher 
pairwise distances between members of a family indicate a larger degree 
of functional diversity. Only families with at least five members are shown. 
The histogram (top panel) was calculated for the entire data set and the 
y-axis corresponds to the percentage of data points in each bin. Boxplot 
whiskers extend to the most extreme data point which is no more than 

1.5 times the interquartile range from the upper or lower quartiles. 


and complex organic carbon, for example, polysaccharides and leaf 
cuticular waxes. The category ‘xenobiotics biodegradation and catabo- 
lisn’ is enriched in the root genomes with respect to those isolated from 
leaves (P=2.60 x 107"); Fig. 3b), which is consistent with previous evi- 
dence that genes for aromatic compound utilization are expressed in 
the rhizosphere”’. No single taxon is responsible for these significant 
differences, but this seems to be a general feature across the sequenced 
bacterial genomes of the respective ecological niche (Extended Data 
Figs 5 and 6). Interestingly, we observed the same trends of differen- 
tial abundance of functional categories in V. vinifera root metagenome 
samples!* compared to their respective unplanted soil controls 
(Extended Data Fig. 7). 

Together, these findings indicate a substantial overlap of functional 
capabilities in the genomes of the Arabidopsis leaf- and root-derived 
culture collections and differences at the level of individual functional 
categories that may reflect specialization of the leaf and root microbiota 
to their respective niche. Additional genomic signatures for niche- 
specific colonization are likely to be hidden in genes for which a func- 
tional annotation is currently unavailable (~57%). 


Synthetic community colonization of germ-free plants 

We colonized germ-free A. thaliana plants with synthetic communi- 
ties (SynComs) consisting of bacterial isolates from our culture col- 
lections to assess their potential for host colonization in a gnotobiotic 
system containing calcined clay as inert soil substitute (Methods). To 
mimic the taxonomic diversity of leaf and root microbiota in natural 
environments we employed mainly two SynComs: ‘L’ comprising 218 
leaf-derived bacteria and ‘R+S’ consisting of 188 members of which 
158 are root-derived and 30 are soil-derived bacteria (Supplementary 
Data 6). Input SynComs were either inoculated directly before sowing 
of surface-sterilized seeds in calcined clay and/or spray-inoculated on 
leaves of three-week-old germ-free plants. For all defined communities 
we examined three independent SynCom preparations, each tested in 
three closed containers containing four plants. We employed 16S rRNA 
gene community profiling with a method validated for defined commu- 
nities” to detect potential community shifts between input and output 
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Figure 3 | Functional analysis of sequenced isolates. a, Phylogeny 

of family-level clusters of bacterial isolates. The tips of the tree are 
annotated, from left to right, with the cluster ID, taxonomic classification, 
followed by the number of sequenced isolates from leaf, root or soil that 
constitute each cluster. The heat map depicts the average percentage of 
annotated proteins of each cluster belonging to each functional category. 
b, Functional enrichment analysis between leaf (n = 206), root (n= 194) 


SynComs in samples of seven week-old roots, leaves, or unplanted clay. 
In this community analysis, ‘indicator OTUs either represent a single 
strain or a known group of isolates (Supplementary Data 6). 

Upon application of the input R+S SynCom to clay (‘R+S in clay’) 
and co-cultivation with A. thaliana plants for seven weeks we retrieved 
reproducible R+S output communities from clay (without host), root, 
and leaf compartments (Supplementary Fig. 2). These output SynCom 
profiles were robust against a 75% reduction in RA of Proteobacteria 
compared to Actinobacteria, Bacteroidetes and Firmicutes in the 
input R+S SynCom (input ratios 1:1:1:1 or 1:1:1:0.25, respectively), 
which was confirmed by PCoA (Fig. 4a). PCoA also revealed dis- 
tinct output communities in each of the three tested compartments 
(Fig. 4a; P< 0.001 Extended Data Fig. 8a, b). This indicates that a 
marked host-independent community change occurred in clay (with- 
out host) as well as host-dependent community shifts that are specific 
for leaves and roots. Next, we tested the ‘LE SynCom of leaf-derived 
bacteria by spray inoculation on leaves of three week-old plants. After 
four weeks of L SynCom co-incubation with plants, output com- 
munities were detected in leaves and roots (Supplementary Fig. 3). 
PCoA revealed that these two output communities were differ- 
ent between each other, but robust against a 75% reduction in RA 
of input Proteobacteria (Fig. 4b; Supplementary Fig. 3; P< 0.001; 


and soil (n = 32) genomes. Points and bars correspond to the mean 
abundance and standard deviation of each functional category. P values 
were obtained using the non-parametric Mann-Whitney test corrected by 
the Bonferroni approach. c, Analysis of pan-genome distribution for each 
cluster of genomes, indicating the percentage of annotated proteins found 
in only one isolate (singletons), in more than one but not all (shell) or in all 
genomes within the cluster (core). 


Extended Data Fig. 8c, d). The converging output communities despite 
varying RAs of input SynComs suggest that the communities have 
reached a steady state. These experiments also reveal that both R+S 
and L SynCom members not only colonize cognate host organs, but 
are capable of ectopic colonization of leaves and roots, which might be 
linked to the extensive species overlap of A. thaliana leaf and root micro- 
biota in natural environments (Fig. 1a, b). Additionally, this provides 
experimental support for the hypothesis that a subset of leaf-colonizing 
bacteria originates from unplanted soil and raises the possibility for 
reciprocal bacterial colonization events between roots and leaves during 
and/or after the establishment of the respective microbiota, for example, 
by ascending migration of rhizobacteria from roots to leaves**. Upon 
leaf spray application of SynComs, a small amount of leaf bacteria is 
likely to land on the clay surface and thereafter colonize roots, which is 
not fundamentally different from processes occurring in natural envi- 
ronments, for example, during rain showers and/or leaf dehiscence. 

A comparison of rank abundance profiles between indicator OTUs 
for all root- and leaf-derived isolates and corresponding OTUs iden- 
tified in the environmental root and leaf samples revealed similar 
trends at phylum, class and family levels (Extended Data Fig. 9). 
This validates the gnotobiotic plant system as a tool for microbiota 
reconstitution experiments. 
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Figure 4 | SynCom colonization of germ-free A. thaliana plants. 

a, b, Principal coordinate analysis (PCoA) of Bray-Curtis distances of 
input and output SynCom profiles of RS in clay (a; n= 60) and L spray 
(b; n= 42) experiments. Each condition was tested with 6 independently 
prepared SynComs; each preparation was used for 3 independent 
inoculations. L, leaf-derived strains; RS, root- and soil-derived strains; 
ER, equal strain ratio; UR, unequal strain ratio. 


Niche-specific microbiota establishment with SynComs 

The species overlap between root and leaf microbiota and their cor- 
responding culture collections (Fig. la, b; Extended Data Fig. 4) 
prompted us to test whether R+S and L SynComs equally contribute 
to root and leaf microbiota establishment. Both SynComs were pooled 
and inoculated in clay together with surface-sterilized A. thaliana seeds 
(designated ‘RSL in clay; Fig. 5a). We also tested whether a preformed 
root-associated community can interfere with leaf-associated com- 
munity establishment. After three weeks of co-cultivation, half of the 
plants grown with the ‘RSL in clay’ SynCom were treated by leaf-spray 
inoculation with the L SynCom supplemented with 15 root-derived 
strains (designated ‘RSL in clay & L+15R spray’). Plant organ-specific 
output communities were determined after a further four weeks of 
co-incubation. We also inoculated the L SynCom alone in clay and 
determined output SynComs (designated ‘L in clay, Fig. 5a). 

We found significant differences between leaf-associated output 
communities of the ‘RSL in clay’ and ‘RS in clay’ experiments (Fig. 5b; 
P<0.001, Extended Data Fig. 8f; Supplementary Figs 2 and 4) and that 
the output community on leaves after ‘L in clay’ inoculation is simi- 
lar to the leaf outputs of ‘RSL in clay’ inoculation (Fig. 5b; P< 0.001, 
Extended Data Fig. 8f; Supplementary Figs 4 and 5), indicating that in 
this comparison the leaf-derived SynCom has a stronger influence on 
leaf microbiota structure than root- and soil-derived bacteria. However, 
both ‘RSL in clay’ and ‘L in clay leaf outputs are significantly different 
from the leaf output of the ‘L spray’ experiment (Fig. 5b; P< 0.001, 
Extended Data Fig. 8e; Supplementary Figs 3-5), showing that many 
leaf-derived isolates do not successfully colonize leaves when only 
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Figure 5 | SynCom competition supports host-organ-specific 
community assemblies. a, Pictograms illustrating “L spray, ‘L in clay, 
‘RS in clay; ‘RSL in clay, and ‘RSL in clay & L+15R spray’ SynCom 
experiments. b, c, PCoA of Bray—Curtis distances of leaf (b; n = 69) 

and root (c; n= 69) outputs of the five experiments illustrated in a. 

R, root-derived isolates; S, soil-derived isolates; L, leaf-derived isolates. 

L in clay was tested with 6 independently prepared SynComs; RSL in clay 
experiment was tested with 3 independently prepared SynComs, each used 
for 3 independent inoculations. All other experiments were tested with 6 
independently prepared SynComs and each preparation was used for 

3 independent inoculations. 


inoculated in the clay environment. For example, of the top 16 gen- 
era a total of three are grossly underrepresented in leaf outputs of the 
‘RSL in clay compared to the ‘RSL in clay & L+15R spray’ experiment 
(Chryseobacterium, Sphingomonas and Variovorax; Supplementary 
Fig. 6) and these three genera are abundant in the natural leaf microbi- 
ota (Extended Data Fig. 4). Finally, leaf outputs were strikingly similar 
between ‘RSL in clay & L+15R spray’ and ‘L spray’ only experiments 
(Fig. 5b; Supplementary Figs 3 and 7), indicating that the L+15R 
SynCom, leaf spray-inoculated three weeks after RSL application to 
clay, can displace the RSL leaf output. Collectively, these results support 
the hypothesis that leaf microbiota establishment benefits from air- and 
soil-borne inoculations*"”, although we note that our single application 
of bacteria to leaves does not mimic the continuous exposure of plant 
leaves to airborne microorganisms in nature. 

A comparison of the root-associated community outputs of the 
experiments described above revealed that the ‘RSL in clay’ experi- 
ment is more similar to root outputs of the ‘RS in clay’ than ‘L in clay’ 
experiments (Fig. 5c; P< 0.001 Extended Data Fig. 8g), suggesting that 
the root- and soil-derived SynCom has a stronger influence on root 
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microbiota structure than the leaf-derived SynCom. In this experiment 
the fractional contribution of root-specific indicator OTUs increases 
in the output, but decreases for leaf-specific indicator OTUs, relative 
to their input, pointing to a potential adaptation of root-derived bac- 
teria for root colonization (Extended Data Fig. 10a; Mann-Whitney; 
P<0.05). This is further supported by the observation that in the “RSL 
in clay’ experiment root colonization rates for root-specific indicator 
OTUs are higher compared to those specific for leaves when applying 
a 0.1% relative abundance threshold in at least one biological repli- 
cate (69% and 33%, respectively). Taken together, this suggests that 
root-derived bacteria are better adapted to colonize their cognate host 
niche than leaf-derived bacteria. Further comparisons of the root- 
associated output communities of the ‘L in clay’ and ‘L spray’ experi- 
ments (Fig. 5c; Supplementary Figs 3 and 5) revealed similar commu- 
nity composition, indicating convergence of ectopic root-associated 
community outputs despite different inoculation time points or sites 
of application. Additional reciprocal transplantation experiments using 
a ‘R (root strains only) SynCom either applied to clay (‘R in clay’) or 
by spray inoculation (‘R spray’) confirmed the convergence of ectopic 
community outputs also for root-derived bacteria on leaves (Extended 
Data Fig. 10 b, c; Supplementary Figs 8 and 9). Convergence of ectopic 
SynCom outputs is consistent with the hypothesis that a subset of leaf 
and root colonizing bacteria has the potential to relocate between leaves 
and roots. 


Conclusions 

By employing systematic bacterial isolation approaches, we estab- 
lished expandable culture collections of the A. thaliana leaf- and root- 
associated microbiota, which capture the majority of the species found 
reproducibly in their respective natural communities (>0.1% relative 
abundance). The sequenced bacterial genomes as well as any future 
updates are available at http://www.at-sphere.com. These resources 
together with the remarkable reproducibility of the gnotobiotic recon- 
stitution system enable future studies on bacterial community estab- 
lishment and functions under laboratory conditions. 


Online Content Methods, along with any additional Extended Data display items 
and Source Data, are available in the online version of the paper; references 
unique to these sections appear only in the online paper. 
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METHODS 


Sampling of A. thaliana plants and isolation of root-, leaf- and soil-derived 
bacteria. A. thaliana plants were either harvested from natural populations or 
grown in different natural soils and used for bacterial isolations by colony picking, 
limiting dilution or bacterial cell sorting as well as 16S rRNA gene-based commu- 
nity profiling. To obtain a library of representative root colonizing bacteria, A. 
thaliana plants were grown in different soils (50.958 N, 6.856 E, Cologne, Germany; 
52.416 N, 12.968 E, Golm, Germany; 50.982 N, 6.827 E, Widdersdorf, Germany; 
47.941 N, 04.012 W, Saint-Evarzec, France; 48.725 N, 3.989 W, Roscoff, France) and 
harvested before bolting. Briefly, Arabidopsis roots were washed twice in washing 
buffers (10 mM MgCl, for limiting dilution and PBS for colony picking®) on a 
shaking platform for 20 min at 180 rpm and then homogenized twice by Precellys24 
tissue lyser (Bertin Technologies) using 3mM metal beads at 5,600 rpm for 30s. 
Homogenates were diluted and used for isolation approaches on several bacterial 
growth media (Supplementary Data 7). For isolations based on colony picking, 
diluted cell suspensions were plated on solidified media and incubated, before 
isolates of plates containing less than 20 colony-forming units (CFUs) were picked 
after a maximum of two weeks of incubation. For limiting dilution, homogenized 
roots from each root pool were sedimented for 15 min and the supernatant was 
empirically diluted, distributed and cultivated in 96-well microtitre plates”°. In 
parallel to the isolation of root-derived bacteria, roots of plants grown in Cologne 
soil were harvested and used to assess bacterial diversity by culture-independent 
16S rRNA gene sequencing. Additionally, soil-derived bacteria were extracted 
from unplanted Cologne soil by washing soil with PBS buffer, supplemented with 
0.02% Silwet L-77 and subjected to bacterial isolation as well as 16S rRNA gene 
community profiling. For the isolation of representative phyllosphere strains, 
naturally grown Arabidopsis plants were collected at eight different sites in southern 
Germany and Switzerland (six main sampling sites used for bacterial isolations 
and community profiling: 47.4090306 N, 8.470169444 E, Hoengg, Switzerland; 
47474825 N, 8.305008333 E, Baden, Switzerland; 47.4816806 N, 8.217547222 E, 
Brugg, Switzerland; 48.5560194 N, 9.134944444 E, Farm, Tuebingen, Germany; 
48.5989861 N, 9.201655556 E, Haeslach, Germany; 48.602682 N, 9.213247258 
E, Haeslach, Germany; and two additional sites only used for bacterial isolation: 
47.4074722 N, 8.50825 E, Zurich, Switzerland; 47.4227222 N, 8.548666667 E, 
Seebach, Switzerland) during spring and autumn of 2013 and used for bacterial 
isolations as well as 16S rRNA gene profiling. Leaf-colonizing bacteria of individ- 
ual leaves were washed off by alternating steps of intense mixing and sonication. 
The suspension was subsequently filtered (CellTrics filters, 10}1M, Partec GmbH, 
GGrlitz, Germany) in order to remove remaining plant or debris particles as well as 
cell aggregates and applied to cell sorting on a BD FACS Aria III (BD Biosciences) 
as well as to plating on different media (Supplementary Data 1 and 7). All isolates 
were subsequently stored in 30% or 40% glycerol at —80°C. 

Culture-independent bacterial 16S rRNA gene profiling of A. thaliana leaf, 
root and corresponding soil samples. Parts of A. thaliana leaves, roots and corre- 
sponding unplanted soil samples used for bacterial isolation were also processed for 
bacterial 16S rRNA gene community profiling using 454 pyrosequencing. Frozen 
root and corresponding soil samples were homogenized, DNA was extracted with 
Lysing Matrix E (MP Biomedicals) at 5,600 rpm for 30s, and DNA was extracted 
from all samples using the FastDNA SPIN Kit for soil (MP Biomedicals) accord- 
ing to the manufacturer’s instructions. Lyophilized leaf samples were transferred 
into 2 ml microcentrifuge tubes containing one metal bead and subsequently 
homogenized twice for 2 min at 25 Hz using a Retsch tissue lyser (Retsch, Haan, 
Germany). Homogenized leaf material was resuspended in lysis buffer of the MO 
BIO PowerSoil DNA isolation Kit (MO BIO Laboratories Inc., Carlsbad, CA, 
USA), transferred into lysis tubes, provided by the supplier, and DNA extraction 
was performed following the manufacturer's protocol. DNA concentrations were 
measured by PicoGreen dsDNA Assay Kit (Life technologies), and subsequently 
diluted to 3.5ng 1 !. Bacteriall6S rRNA genes were subsequently amplified® using 
primers targeting the variable regions V5-V7 (799F”° and 1193R°, Supplementary 
Data 7). Each sample was amplified in triplicate by two independent PCR mix- 
tures (a total of 6 replicates per sample plus respective no template controls). PCR 
products of triplicate were subsequently combined, purified and subjected to 454 
sequencing. Obtained sequences were demultiplexed as well as quality and length 
filtered (average quality score >25, minimum length 319 bp with no ambiguous 
bases and no errors in the barcode sequences allowed)’. High-quality sequences 
were subsequently processed using the UPARSE” pipeline and OTUs were taxo- 
nomically classified using the Greengenes database** and the PYNAST”® method. 
High-throughput identification of leaf-, root- and soil-derived bacterial 
isolates by 454 pyrosequencing. We adopted a two-step barcoded PCR protocol”” 
in combination with 454 pyrosequencing to define V5-V8 sequences of bacterial 
16S rRNA genes of all leaf, root- and soil-derived bacterial (Supplementary Fig. 1). 
DNA of isolates was extracted by lysis of 6 1] of bacterial cultures in 101] of buffer 
I containing 25 mM NaOH, 0.2mM EDTA, pH 12 at 95°C for 30 min, before the 


pH value was lowered by addition of 101] of buffer II containing 40 mM Tris- 
HClat pH 7.5. Position and taxonomy of isolates in 96-well microtitre plates were 
indexed by a two-step PCR protocol using the degenerate primers 799F and 1392R 
containing well- and plate-specific barcodes (Supplementary Data 7) to amplify 
the variable regions V5 to V8. During the first step of PCR amplification, DNA 
from 1.5 11 of lysed cells was amplified using 2 U DSF-Taq DNA polymerase, 
1x complete buffer (both Bioron GmbH), 0.2 mM dNTPs (Life technologies), 
0.2\1M of 1 of 96 barcoded forward primer with a 18-bp linker sequence (for 
example, Al_454_799F1_PCRI1_wells; Supplementary Data 7) and 0.2\1M reverse 
primer (454B_1392R) ina 25,1] reaction. PCR amplification was performed under 
the following conditions: DNA was initially denaturised at 95°C for 2 min, followed 
by 40 cycles of 95°C for 30s, 50°C for 30s and 72°C for 45s, and a final elonga- 
tion step at 72°C for 10 min. PCR products of each 96-well microtitre plate were 
combined and subsequently purified in a two-step procedure using the Agencourt 
AMPure XP Kit (Beckman Coulter GmbH, Krefeld, Germany) first, then DNA frag- 
ments were excised from a 1% agarose gel using the QlAquick Gel Extraction Kit 
(Qiagen). DNA concentration was measured by Nanodrop and diluted to Ingyl?. 

During the second PCR step, 1 ng of pooled DNA (each pool represents one 
96-well microtitre plate) was amplified by 1.25 U PrimeSTAR HS DNA Polymerase, 
1x PrimeSTAR Buffer (both TaKaRa Bio S.A.S, Saint-Germain-en-Laye, France), 
0.2mM dNTPs (Thermo Fisher Scientific Inc.), 0.2|1M of 1 of 96 barcoded for- 
ward primer targeting the 18-bp linker sequence (for example, P1_454_PCR2; 
Supplementary Data 7) and 0.2\1M reverse primer (454B_1392R) in a 501] reac- 
tion. The PCR cycling conditions were as follows. First, denaturation at 98 °C for 
30s, followed by 25 cycles of 98°C for 10s, 58°C for 15s and 72°C for 30s, anda 
final elongation at 72°C for 5 min. PCR products were purified using the Agencourt 
AMPure XP Kit (Beckman Coulter GmbH) and QIAquick Gel Extraction Kit 
(Qiagen) as described for the purification of first step PCR amplicons. DNA con- 
centration was determined by PicoGreen dsDNA Assay Kit (Life technologies) 
and samples were pooled in equal amounts. The final PCR product libraries were 
sequenced on the Roche 454 Genome Sequencer GS FLX +. Each sequence con- 
tained a plate-barcode, a well-barcode and V5-V8 sequences. 

The sequences were quality filtered, demultiplexed according to well and plate 
identifiers*’. OTUs were clustered at 97% similarity by UPARSE algorithum”, A 
nucleotide-based blast (v. 2.2.29) was used to align representative sequences of 
isolated OTUs to culture-independent OTUs and only hits >97% sequence identity 
covering at least 99% of the length of the sequences were considered. 
Preparation of A. thaliana leaf (At-LSPHERE), root (At-RSPHERE) and soil 
bacterial culture collections. Based on representative sequences of OTUs from 
this as well as previously published culture-independent community analysis, bac- 
terial CFUs in the culture collections with >97% 16S rRNA gene identity to root-, 
leaf- and soil-derived OTUs were purified by three consecutive platings on the 
respective solidified media before an individual colony was used to inoculate liquid 
cultures. These liquid cultures were used for validation by Sanger sequencing with 
both 799F and 1392R primers as well as for the preparation of glycerol stocks for 
the culture collections and for the extraction of genomic DNA for whole-genome 
sequencing. A total of 21 leaf-derived strains, previously described as phyllosphere 
bacteria®°, were added to the At-LSPHERE collection although these were unde- 
tectable in the present culture-independent leaf community profiling. 
Preparation of bacterial genomic DNA for whole-genome sequencing. To 
obtain high molecular weight genomic DNA of bacterial isolates in our culture 
collections, we used a modified DNA precipitation protocol and the Agencourt 
AMPure XP Kit (Beckman Coulter GmbH). For each bacterial liquid culture, 
cells were collected by centrifugation at 3,220g for 15 min, the supernatant 
removed and cells were resuspended in 5 ml SET buffer containing 75 mM NaCl, 
25mM EDTA, 20 mM Tris/HCl at pH 7.5. A total of 2011 lysozyme solution 
(50mg ml ', Sigma) was added before the mixture was incubated for 30 min at 
37°C. Subsequently, 10011 20 mg ml"! proteinase K (Sigma-Aldrich Chemie 
GmbH, Taufkirchen, Germany) and 10% SDS (Sigma-Aldrich Chemie GmbH) 
were added, mixed, and incubated by shaking every 15 min at 55°C for 1h. 
If bacterial cells were insufficiently lysed, remaining cells were collected at 3,220g 
for 10 min and homogenized using the Precellys24 tissuelyser in combination with 
lysing matrix E tubes (MP Biomedicals) at 6,300 rpm for 30s. After cell lysis, 2 ml 
5 M NaCland 5 ml chloroform were added and mixed by inversion for 30 min at 
room temperature. After centrifugation at 3,220 g for 15 min, 6 ml supernatant 
were transferred into fresh falcon tubes and 3.6 ml isopropanol were added and 
gently mixed. After precipitation at 4°C for 30 min, genomic DNA was collected 
at 3,220g for 5 min, washed once with 1 ml 70% (v/v) ethanol, dried for 15 min 
at room temperature and finally dissolved in 2501] elution buffer (Qiagen). 211 
4mg ml! RNase A (Sigma-Aldrich Chemie GmbH) was added to bacterial 
genomic DNA solution and incubated over night at 4°C. 

The genomic DNA was subsequently purified using the Agencourt AMPure 
XP Kit (Beckman Coulter GmbH) and analysed by agarose gel (1% (w/v)) 
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electrophoresis. Concentrations were estimated based on loaded Lambda DNA 
Marker (GeneRuler 1kb Plus, Thermo Scientific) and approximately 11g of 
genomic DNA was transferred into micro TUBE Snap-Cap AFA Fibre vials 
(Covaris Inc., Woburn, MA, USA). DNA was sheared into 350 bp fragments by 
two consecutive cycles of 30s (duty cycle: 10%, intensity: 4, cycle/burst: 200) on a 
Covaris S2 machine (Covaris, Inc.). The Illumina sequencing libraries were pre- 
pared according to the manual of NEBNext Ultra UltraTM DNA Library Prep Kit 
for Illumina (New England Biolabs, USA). Quality and quantity was assessed at all 
steps by capillary electrophoresis (Agilent Bioanalyser and Agilent Tapestation). 
Finally libraries were quantified by fluorometry, immobilized and processed onto 
a flow cell with a cBot (Illumina Inc., USA) followed by sequencing-by-synthesis 
with TruSeq v3 chemistry on a HiSeq2500 (Illumina Inc., USA). 

Genome assembly and annotation. Paired-end Illumina reads were subjected to 
quality and length trimming using Trimmomatic v. 0.33* and assembled using two 
independent methods (A5*! and SOAPdenovo* v. 20.1). In each case, the assembly 
with the smaller number of scaffolds was selected. Detailed assembly statistics for 
each sequenced isolate can be found in Supplementary Data 3 and 4. Identification 
of putative protein-encoding genes and annotation of the genomes were performed 
using GLIMMER v. 3.02°*. Functional annotation of genes was conducted using 
Prokka v. 1.1134 and the SEED subsystems approach using the RAST server API*°. 
Additionally, annotation of KEGG Orthologue (KO) groups was performed by first 
generating HMM models for each KO in the database***” the HMMER toolkit 
(v. 3.1b2)*8. Next, we employed the HMM models to search all predicted ORFs 
using the hmmsearch tool, with an E value threshold of 10 x 10-°. Only hits 
covering at least 70% of the protein sequence were retained and for each gene and 
the match with the lowest E value was selected. 

Analyses of phylogenetic diversity within sequenced isolates. Each proteome 
was searched for the presence of the 31 well-conserved, single-copy, bacterial 
AMPHORA genes”, designed for the purpose of high-resolution phylogeny 
reconstruction of genomes. Subsequently, a concatenated alignment of these 
marker genes was performed using Clustal Omega” v. 1.2.1. Based on this multiple 
sequence alignment, a species tree was inferred using FastTree*! vy. 2.1, a maximum 
likelihood tool for phylogeny inference. Whole-genome taxonomic classification 
of sequenced isolates was conducting using taxator-tk”’, a homology/based tool 
for accurate classification of sequences. Analyses of phylogenetic diversity were 
performed independently for each cluster based on pairwise tree distances between 
all isolates (Supplementary Data 5). 

Analyses of functional diversity between sequenced isolates. Analyses of func- 
tional diversity between sequenced isolates were conducted by generating, for each 
genome in the data set, a profile of presence/absence of each KO group (or phyletic 
pattern). Subsequently, a distance measure based on the Pearson correlation of each 
pair of phyletic patterns was calculated, which allowed us to embed each genome as 
a data point in a metric space. PCoA was performed on this space of functional dis- 
tances using custom scripts written in R. Pairwise functional distances within each 
family-level cluster was performed by calculating the average distance between 
all pairs of genomes belonging to each cluster. Finally, we calculated RAs of each 
functional category based on the percentage of annotated KO terms assigned to 
each category. Enrichment tests were performed to identify differentially abundant 
categories between groups of genomes based on their origin (root versus leaf and 
root versus soil) using the non-parametric Mann-Whitney Test (MWT). P values 
were corrected for multiple testing using the Bonferroni method, with a signifi- 
cance threshold a=0.05. 

Recolonization experiments of leaf-, root- and soil-derived bacteria on 
Arabidopsis. Calcined clay", an inert soil substitute, was washed with water, ster- 
ilized twice by autoclaving and heat-incubated until being completely dehydrated. 
A. thaliana Col-0 seeds were surface-sterilized with ethanol and stratified overnight 
at 4°C. Leaf-, root- and soil-derived bacteria of the culture collections were culti- 
vated in 96-deep-well plates and subsequently pooled (in equal or unequal ratios) 
in order to prepare synthetic bacterial communities (SynComs) for inoculations 
below the carrying capacity of leaves and roots**, To inoculate SynComs into 
the calcined clay matrix, OD goo was adjusted to 0.5 and 1 ml (~2.75 x 108 cells) 
was added to 70 ml 0.5 MS media (pH 7; including vitamins, without sucrose), 
and mixed with 100g calcined clay in Magenta boxes (~2.75 x 10° cells per gr 
calcined clay), directly before sowing of surface-sterilized seeds. Plants were grown 
at 22°C, 11h light, and 54% humidity. Alive cell counts (CFUs) of root-associated 
bacteria by serial dilutions of root homogenates after seven weeks of co-incubation 
were 1.4.x 10°+8.4 x 10’ cells per gram root tissue. For leaf spray-inoculation of 
A. thaliana plants, bacterial SynComs were prepared as described above and 
adjusted to OD¢00 0.2, before the solution was diluted tenfold and 170,11 (~1.87 x 10° 
cells) were sprayed into each magenta box containing four three-week-old plants 
using a TLC chromatographic reagent sprayer (BS124.000, Biostep GmbH, 
Jahnsdorf, Germany). The average volume per spraying event was determined by 
spraying repeatedly into 50 ml tubes and weighing before and after. All plants and 
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corresponding unplanted clay samples were harvested under sterile conditions 
after a total incubation period of seven weeks. All plants and corresponding 
unplanted clay samples were harvested under sterile conditions after a total incu- 
bation period of seven weeks. During harvest, leaves and roots of individual plants 
were carefully separated using sterilized tweezers and scissors to avoid cross- 
contamination and processed separately thereafter. All leaves being obviously 
contaminated with clay particles or touching the ground were carefully removed 
and omitted from further processing. Remaining aerial parts of four plants col- 
lected from one magenta box were combined and transferred into lysing matrix 
E tubes (MP Biomedicals), frozen in liquid nitrogen and stored at —80°C until 
used for DNA extraction. Roots from one Magenta box were pooled, washed twice 
in 5 ml PBS at 180 rpm for 20 min, dried on sterilized Whatman glass microfibre 
filters (GE Healthcare Life Sciences), transferred into lysing matrix E tubes (MP 
Biomedicals), frozen in liquid nitrogen and stored at —80°C until further pro- 
cessing. The corresponding unplanted clay samples were washed in 100 ml PBS 
supplemented with 0.02% Silwet L-77 at 180 rpm for 10 min, before particles 
were allowed to settle down for 5 min. The supernatant was collected by centrif- 
ugation at 3,220g for 15 min. The pellet was subsequently resuspended in 1 ml 
water, transferred into lysing matrix E tubes (MP Biomedicals), frozen in liquid 
nitrogen and stored at —80°C. 

To prepare DNA for bacterial 16S rRNA gene-based community analysis, all 
samples were homogenized twice by Precellys24 tissue lyser (Bertin Technologies), 
DNA was extracted and concentrations were measured by PicoGreen dsDNA 
Assay Kit (Life technologies), before bacterial 16S rRNA genes were amplified by 
degenerate PCR primers (799F and 1193R) targeting the variable regions V5-V7 
(Supplementary Data 7). Each sample was amplified in triplicate (plus respective 
no template control) in 25 11 reaction volume containing 2 U DFS-Taq DNA pol- 
ymerase, 1 x incomplete buffer (both Bioron GmbH, Ludwigshafen, Germany), 
2mM MgCh, 0.3% BSA, 0.2mM dNTPs (Life technologies GmbH, Darmstadt, 
Germany), 0.3 1M forward and reverse primer and 10 ng of template DNA. After 
an initial denaturation step at 94°C for 2 min, the targeted region was amplified by 
25 cycles of 94°C for 30s, 55°C for 30s and 72°C for 60, followed by a final elon- 
gation step of 5 min at 72°C. The three independent PCR reactions were pooled 
and the remaining primers and nucleotides were removed by addition of 20 U 
exonuclease I and 5 U Antarctic phosphatase (both New England BioLabs GmbH, 
Frankfurt, Germany) and incubated for 30 min at 37°C in the corresponding 
1x Antarctic phosphatase buffer. Enzymes were heat-inactivated and the digested 
mixture was used as template for the 2nd step PCR using the Illumina compatible 
primers B5-F and 1 of 96 differentially barcoded reverse primers (B5-1 to B5-96, 
Supplementary Data 7). All samples were amplified in triplicate for 10 cycles 
using identical conditions of the first-step PCR. Technical replicates of each sam- 
ple were combined, run on a 1.5% (w/v) agarose gel and the bacterial 16S rRNA 
gene amplicons were extracted using the QIAquick Gel Extraction Kit (Qiagen) 
according to the manufacturer’s instructions. DNA concentration was subsequently 
measured using the PicoGreen dsDNA Assay Kit (Life technologies) and 100 ng of 
each sample were combined. Final amplicon libraries were cleaned twice using the 
Agencourt AMPure XP Kit (Beckman Coulter GmbH) and subjected to sequenc- 
ing on the Illumina MiSeq platform using an MiSeq Reagent kit v3 following the 
2 x 350 bp paired-end sequencing protocol (Illumina Inc. USA). 

Forward and reverse reads were joined, demultiplexed and subjected to quality 
controls using scripts from the QUME toolkit’, v. 180 (Phred > 20). The resulting 
high quality sequences were further clustered at 97% sequence identity together 
with Sanger sequences of leaf, root and soil isolates using the UPARSE™ pipeline 
as described above. Taxonomic assignments of representative sequences were per- 
formed as explained in the previous sections. OTUs only corresponding to one or 
more Sanger 16S rRNA gene sequence(s) of purified strains in the At-RSPHERE, 
At-LSPHERE or soil collection were selected and designated ‘indicator OTUs. The 
heat maps were generated using the ggplot2 R package. 

Accession numbers. Sequencing reads (454 16S rRNA, MiSeq 16S rRNA and 
WGS HiSeq reads) have been deposited in the European Nucleotide Archive 
(ENA) under accession numbers PRJEB11545, PRJEB11583 and PRJEB11584. 
Genome assemblies and annotations corresponding to the leaf, root and soil cul- 
ture collections have been deposited in the National Center for Biotechnology 
Information (NCBI) BioProject database under accession numbers PRJNA297956, 
PRJNA297942 and PRJNA298127, respectively. 

Code availability. All scripts for computational analysis and corresponding raw data 
are available at http://www.mpipz.mpg.de/R_scripts. The sequenced bacterial 
genomes as well as any future updates are available at http://www.at-sphere.com. 
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Extended Data Figure 1 | Culture-dependent coverage of A. thaliana 
root- and leaf-associated OTUs identified in several cultivation- 
independent studies. a-d, The inner circle depicts taxonomic 
assignments of top 100 root-associated OTUs (filled dots) for the indicated 
phyla and families that were identified in the current (a), ref. 6 (b) and 
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ref. 12 (c) studies with Cologne-soil-grown plants, and current leaf 

(d) study at locations around Tiibingen and Zurich. Black squares of the 
outer ring highlight OTUs sharing > 97% 16S rRNA gene similarity to 
Arabidopsis root or leaf bacterial culture collection. 
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Extended Data Figure 2 | 16S rRNA gene community profiling of 
phyllosphere samples from different locations. a—d, The indicated 


Beta-diversity indices were calculated from leaf samples (n = 60) collected 


from natural A. thaliana populations oe in the areas around Tiibingen 
and Zurich. The indicated colour code refers to sampling locations, sampling 
sites, sampling season, and combined or individual leaves of respective plants. 
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Extended Data Figure 3 | At-RSPHERE, At-LSPHERE and soil bacterial 
culture collections. a, At-RSPHERE (n = 206 isolates), a culture collection 
of the A. thaliana root microbiota. b, At-LSPHERE (n= 224 isolates), 

a culture collection of the A. thaliana leaf microbiota. c, Bacteria isolated 
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from Cologne soil (n = 33 isolates). Numbers inside white circles indicate 
the number of bacterial isolates sharing > 97% sequence identity, but 
isolated from independent roots, leaves and soil batches. 
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Tubingen. c, d, Rank abundance plots of top 20 genera (c) and OTUs (d) in 
leaf bacterial communities from Zurich and Tubingen with corresponding 
genera detected in root bacterial communities from Cologne. Boxplot 
whiskers extend to the most extreme data point which is no more than 
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Extended Data Figure 5 | Phylogenetic distribution of ‘carbohydrate 
metabolism’ genes across sequenced isolates. a, Phylogeny of sequenced 
leaf (n = 206), root (n = 194) and soil (n = 32) isolates based on the 
concatenated alignment of the 31 conserved AMPHORA phylogenetic 
marker genes. The origin of each genome (leaf, root or soil) is shown by 
different shapes and their taxonomic affiliation (phylum level) is depicted 
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using various colours. Shaded areas correspond to the different clusters of 
genomes and are annotated with their consensus taxonomy (family level). 
b, Relative abundance of protein coding genes classified as belonging 

to the KEGG general term ‘carbohydrate metabolism, measured as 
percentage of annotated proteins per genome. 
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Extended Data Figure 6 | Phylogenetic distribution of ‘xenobiotic affiliation (phylum level; class level for Proteobacteria) is depicted using 
biodegradation and metabolism” genes across sequenced isolates. various colours. Shaded areas correspond to the different clusters of 
a, Phylogeny of sequenced leaf (n = 206), root (n= 194) and soil (n = 32) genomes and are annotated with their consensus taxonomy (family level). 
isolates based on the concatenated alignment of the 31 conserved b, Relative abundance of protein coding genes classified as belonging to 
AMPHORA phylogenetic marker genes. The origin of each genome the KEGG general term ‘xenobiotics biodegradation and metabolism, 
(leaf, root or soil) is shown by different shapes and their taxonomic measured as percentage of annotated proteins per genome. 
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Extended Data Figure 7 | V. vinifera metagenome comparison. correspond to the percentage of annotated genes in each genome or 
a, b, Functional enrichment analysis of V. vinifera root and soil shotgun metagenome sample. Boxplot whiskers extend to the most extreme data 
metagenomes (a; n= 47) compared to A. thaliana culture collection point which is no more than 1.5 times the interquartile range from the 
root and soil genomes (b; n = 432). Functional category abundances upper or lower quartiles. 
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within the same cluster and between different clusters of the RS in clay asterisks were subjected to a Student's t-test (P < 0.001 in each case). L in 
experiments. c, Comparison of pairwise distances between input samples clay was tested with 6 independently prepared SynComs (n = 6); RSL in 
and between input and output samples of the L spray experiments. clay experiment was tested with 3 independently prepared SynComs, each 
d, Comparison of pairwise distances within samples within the same used for 3 independent inoculations (n= 9). All other experiments were 
cluster and between different clusters of the L spray experiments. tested with 6 independently prepared SynComs and each preparation was 
e, Comparison of pairwise distances between samples within the same used for 3 independent inoculations (n= 18). L, leaf-derived strains; 


cluster and between different clusters of the leaf output across experiments. __ RS, root- and soil-derived strains. 
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Extended Data Figure 9 | Similarity of rank abundances of SynCom 
outputs with corresponding root- and leaf-associated OTUs of plants 
grown in natural environments. a—c, Rank abundance plots of SynCom 
root outputs (n = 69) with corresponding root-associated OTUs in 
natural communities (n = 8) from plants grown in the present study in 
Cologne soil at the taxonomic ranks of phylum (a), order (b) and 
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family (c). d-f, Rank abundance plots of SynCom leaf outputs (n = 69) 
with corresponding leaf-associated OTUs in natural communities (n = 60) 
from plants grown in the present study around Tuebingen or Zurich at the 
taxonomic ranks of phylum (d), order (e) and family (f). Boxplot whiskers 
extend to the most extreme data point which is no more than 1.5 times the 
interquartile range from the upper or lower quartiles. 
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Extended Data Figure 10 | Fractional contribution of At-LSPHERE and R, root-derived isolates; S, soil-derived isolates; L, leaf-derived isolates. 


At-RPSHERE-specific OTUs and SynCom competition supports host RSL in clay experiment was tested with 3 independently prepared 
organ-specific community assemblies. a, Fractional contribution of SynComs, each used for 3 independent inoculations. All other 
At-LSPHERE and At-RPSHERE specific OTUs in the input, leaf and the experiments were tested with 3 independently prepared SynComs and 
root output communities in the ‘RSL in clay’ experiment (n= 9). each preparation was used for 3 independent inoculations. Boxplot 

b, c, PCoA of Bray—Curtis distances of root (b; = 21) and leaf (c; n=21) whiskers extend to the most extreme data point which is no more than 
outputs of the ‘R in clay, ‘RS in clay; and ‘R spray’ SynCom experiments. 1.5 times the interquartile range from the upper or lower quartiles. 
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Phosphorylation and linear ubiquitin 
direct A20 inhibition of inflammation 


Ingrid E. Wertz!*, Kim Newton®, Dhaya Seshasayee*, Saritha Kusam?}, Cynthia Lam’, Juan Zhang‘, Nataliya Popovych’, 
Elizabeth Helgason’, Allyn Schoeffler’, Surinder Jeet*, Nandhini Ramamoorthi*, Lorna Kategaya!’, Robert J. Newman®, 

Keisuke Horikawa®, Debra Dugger’, Wendy Sandoval’, Susmith Mukund®, Anuradha Zindal’, Flavius Martin’, Clifford Quan?, 
Jeffrey Tom’, Wayne J. Fairbrother’, Michael Townsend‘, Soren Warming”, Jason DeVoss‘, Jinfeng Liu°, Erin Dueber?, 

Patrick Caplazi!©, Wyne P. Lee*, Christopher C. Goodnow!"!, Mercedesz Balazs*, Kebing Yu’, Ganesh Kolumam* & Vishva M. Dixit? 


Inactivation of the TNFAIP3 gene, encoding the A20 protein, is associated with critical inflammatory diseases including 
multiple sclerosis, rheumatoid arthritis and Crohn’s disease. However, the role of A20 in attenuating inflammatory 
signalling is unclear owing to paradoxical in vitro and in vivo findings. Here we utilize genetically engineered mice 
bearing mutations in the A20 ovarian tumour (OTU) -type deubiquitinase domain or in the zinc finger- 4 (ZnF4) ubiquitin- 
binding motif to investigate these discrepancies. We find that phosphorylation of A20 promotes cleavage of Lys63-linked 
polyubiquitin chains by the OTU domain and enhances ZnF4-mediated substrate ubiquitination. Additionally, levels of 
linear ubiquitination dictate whether A20-deficient cells die in response to tumour necrosis factor. Mechanistically, linear 
ubiquitin chains preserve the architecture of the TNFR1 signalling complex by blocking A20-mediated disassembly of 
Lys63-linked polyubiquitin scaffolds. Collectively, our studies reveal molecular mechanisms whereby A20 deubiquitinase 
activity and ubiquitin binding, linear ubiquitination, and cellular kinases cooperate to regulate inflammation and cell death. 


Debilitating autoimmune syndromes and inflammatory diseases are 
associated with inactivation of the TNFAIP3 gene, which encodes 
the A20 protein!. Despite the well-validated role of A20 in attenuat- 
ing inflammation, fundamental mechanistic questions regarding A20 
function remain unanswered. For example, A20 inactivation enhances 
pro-survival signalling and promotes expression of proteins that antag- 
onize cell death, yet A20 deficiency is also reported to sensitize cells 
to TNF-induced death?*°. Furthermore, A20 contains an OTU-type 
deubiquitinase domain that, in cells, cleaves scaffolding Lys63 (K63)- 
linked polyubiquitin to disassemble inflammatory receptor signalling 
complexes including tumour necrosis factor receptor-1 (TNFR1); how- 
ever, in vitro A20 cleaves K48-, but not K63-linked polyubiquitin®’. 
Finally, the A20 ZnF4 motif binds ubiquitin and facilitates substrate 
ubiquitination, but clear demonstration of A20 ubiquitin ligase func- 
tion is not established*"!° (Extended Data Fig. 1a). 


A20 edits ubiquitination of TNFR1 and associated 
proteins 

We performed proteomic, biochemical and in vivo analyses to inves- 
tigate discrepancies regarding A20 function. To this end, we gener- 
ated three strains of mice to compare the effect of inactivating A20 
functional motifs in vivo: the OTU catalytic Cys103 was mutated to 
Ala to abolish A20 deubiquitinase activity in Tnfaip30!/O™ mice®”, 
Cys609 and Cys612 were mutated to Ala to disrupt ZnF4 structure 
in Tnfaip34O"40% mice®”, and Tyr599 and Phe600 were mutated to 
Ala to compromise ZnF4 ubiquitin binding in Tnfaip340"4" mice? 
(Extended Data Fig. 2). In vivo studies confirmed heightened sensitivity 
of these mutant mice to TNF challenge (Extended Data Fig. 3a—d) 
and Tnfaip3°/O™ and Tnfaip340"49 mice had more severe myelin 


oligodendrocyte glycoprotein-induced experimental autoimmune 
encephalomyelitis (MOG-EAE), a TNF-regulated disease model’? 
(Extended Data Fig. 3e, f). These findings in Tnfaip30”/™ mice and 
in Tnfaip3*?0'"49* mice agree with a previous study'. Importantly, 
the comparable phenotypes of Tnfaip340"4* and Tnfaip340"/4 
mice indicate that ZnF4 ubiquitin binding is critical for A20 function 
(Extended Data Fig. 3c, d). We next characterized signalling complexes 
in TNF-treated cells derived from Tnfaip30™”/O™ and Tnfaip340"49" 
mice. Protracted association of TNF receptor associated death domain 
(TRADD), transforming growth factor-B activated kinase-1 (TAK1), 
and modified receptor interacting protein kinase-1 (RIPK1), but not 
IkB kinase-8 (IKK8), with activated TNFR1 was apparent in both types 
of A20 mutant cells relative to wild-type cells (Fig. 1a, b). Prolonged 
association of TAK1 and modified RIPK1 with activated TNFR1 
corroborated enhanced downstream MKK3, MKK4, JNK and p38 
MAPK activity, whereas transient IK association was reflected in the 
modest enhancement of downstream NF«B signalling relative to wild- 
type cells (Fig. 1a, b, Extended Data Fig. 4d—g). A20 OTU(C103A), 
A20 ZnF4(C609A,C612A), and A20 ZnF4(Y599A,F600A) were all 
expressed and recruited to liganded TNFR1 at levels equivalent to or 
greater than wild-type A20, thereby excluding reduced association 
as an explanation for enhanced TNFR1 signalling (Extended Data 
Fig. 4a—c). We reasoned that the recruitment efficiency of TAK] and 
IkKG might reflect the global ubiquitination status of activated TNFR1 
signalling components and thus serve as a functional readout of the 
effects of A20 OTU and ZnF4 domain inactivation. More specifically, 
because TAB2/3 in the TAK1 complex have the highest affinity for 
K63-linked polyubiquitin'* © the more pronounced differential in 
TAKI recruitment between Tnfaip3°”/™ or Tnfaip349" cells 
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94080, USA. "Department of Cancer Biology and Therapeutics, The John Curtin School of Medical Research, The Australian National University, Canberra, Australian Capital Territory 2601, 
Australia. Protein Chemistry, Genentech, South San Francisco, California 94080, USA. ®Structural Biology, Genentech, South San Francisco, California 94080, USA. °Bioinformatics, Genentech, 
South San Francisco, California 94080, USA. !°Pathology, Genentech, South San Francisco, California 94080, USA. !!Immunogenomics Laboratory, Immunology Division, Garvan Institute of 
Medical Research, 384 Victoria Street, Darlinghurst, New South Wales 2010, Sydney, Australia. ;Present address: Gilead Sciences, Inc., Department of Biology, Foster City, California 94404, USA. 


370 | NATURE | VOL 528 | 17 DECEMBER 2015 


© 2015 Macmillan Publishers Limited. All rights reserved 


A20 OTU 
a WT A20 (C103A) b 
Flag- ge & Flag- 
TNF (min: SLPS SLPs TNF (min) 
250. 250. 
1484 148 


IP: Flag ga] IP: Flag 


WB: TNFR1 64 


2504 


IP: Flag 250) 
WB: RIPK1 
1484 


IP: Flag IP: Flag 

WB: TAK1 WB: TAK1 

IP: Flag IP: Flag 

WB: IxKB WB: IxKB 

WB: TRADD. 64 —_- WB: TRADD. 

WB: TNFR1 WB: TNFR1 
98 = 

WB: RIPK1 WB: RIPK1 
98. 

WB: TAK1 ee) §=WB: TAK 


WB: IkKB gg SS we: kp 
50: 
we:TRADD " [S@ee=eeee) we:tra0 


Figure 1 | A20 OTU and ZnF4 domains regulate TNFR1 ubiquitination. 
a, Immunoblot analysis of Flag-TNF-engaged immunocomplexes in 
wild-type and Tnfaip30/°" MEFs with corresponding whole-cell lysates. 
Asterisk, background band. UnRx, untreated. b, Immunoblot analysis of 
Flag~TNF-engaged immunocomplexes in wild-type and Tnfaip3“40740s 


MEFs and the corresponding whole-cell lysates. c, Characterization of 


relative to wild-type cells reflects the significant roles that the A20 
OTU and ZnF4 domains play in regulating the abundance of K63- 
ubiquitinated TNFRI1 signalling proteins. Moreover, because NEMO 
within the IkK complex preferentially binds linear chains'”'’, the mod- 
est differential in IxK6 recruitment and downstream NF«B stimulation 
between wild-type and Tnfaip3°™”/O" or Tnfaip34O9 cells indicates 
the minimal impact that A20 has in regulating linear ubiquitination. 

To evaluate this idea, we first used unbiased proteomics approaches. 
Anti-Flag-TNF or anti-K-e-GG immunoprecipitates from TNF-treated 
cells were analysed by mass spectrometry to identify TNFR1-associated 
proteins and ubiquitinated proteins in TNF-treated cells, respectively. 
RIPK1 and TNERI were identified as two highly ubiquitinated proteins 
recruited to activated TNFR1 (Extended Data Fig. 5a). Robustly elevated 
levels of highly modified TNFR1 were also evident in Tnfaip30”/O™, 
Tnfaip32099, Tnfaip340"4™, and in Tnfaip3-/~ cells? relative to 
wild-type cells (Fig. 1a, b, Extended Data Figs 4a—c and 5b). TNFR] in 
wild-type MEFs was modified predominantly with linear polyubiquitin 
after stimulation with TNF or Flag epitope-tagged TNF, whereas RIPK1 
was modified primarily with K63-polyubiquitin (Fig. 1c, Extended Data 
Fig. 5c). Mass spectrometry identified K256 as a TNFR1 ubiquitina- 
tion site (Fig. 1d, Supplementary Information a, b) and expression of 
a murine K256R mutant in HEK293T cells decreased TNF-induced 
ubiquitination (Extended Data Fig. 5d). Thus both A20 OTU and ZnF4 
domains regulate the ubiquitination status of liganded TNFRI, in addi- 
tion to RIPK1, to limit TNF signalling. Our next aim was to understand 
how A20 OTU and ZnF4 domains regulate specific polyubiquitin chain 
types in the TNFR signalling complex. 

We first focused on the A20 OTU deubiquitinase domain. K63- 
linked polyubiquitination of TNFR1-associated RIPK1 was more 
robust in Tnfaip30™°™ cells than in wild-type cells (Extended Data 
Fig. 5e), and only K63 but not linear ubiquitination of TNFR1 was 
enhanced in Tnfaip3°/™ cells relative to wild-type cells (Extended 
Data Fig. 5f). These data are consistent with the protracted TAK1 
recruitment to K63-linked polyubiquitinated proteins and enhanced 
MAPK activation in Tnfaip30/™ cells. The negligible effect of A20 
OTU domain inactivation on linear ubiquitination corroborates 
the modest differential in IkK@ recruitment to linear-ubiquitinated 
TNER1 complex proteins and subtle enhancement of NFkB activity in 
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TNERI or RIPK1 ubiquitination in wild-type MEFs using ubiquitin 
(Ub)-linkage selective antibodies. For lysates see Extended Data Fig. 5c. 
d, Murine TNER1 is ubiquitinated at K256. Mass spectrometry analysis of 
ubiquitinated TNFR1 peptides isolated from TNF-treated Tnfaip30™”/O™ 
MEFs. Gel source data are in Supplementary Figs 1, 2, 8. Data represent 
two to four biological replicates. 


Thfaip3°”/O™ relative to wild-type cells (Fig. 1a, Extended Data Fig. 4f). 
Mass spectrometry revealed that approximately 1.8 times more TNF 
receptors are ubiquitinated at the K256 site in the Tnfaip30”/°™ MEFs 
compared to wild-type MEFs, a value consistent with increased RIPK1 
ubiquitination (Extended Data Fig. 5g, Supplementary Information c) 
and evident in Tnfaip3°™/°" cells compared to wild-type cells (Fig. 1a, 
Extended Data Figs 4a and 5b, e, f). Inactivation of A20 deubiquitinase 
activity therefore does not increase the number of ubiquitination sites 
on activated TNFR1 or RIPK1; instead, a larger fraction of these pro- 
teins remains ubiquitinated and drives enhanced signalling. 


Phosphorylated A20 hydrolyses K63-polyubiquitin 

These data revealed a conundrum: while the A20 OTU domain clearly 
regulates K63-linked ubiquitination in vivo, recombinant A20 cleaves 
K63-linked polyubiquitin inefficiently in vitro’. Surprisingly, while 
Escherichia coli-expressed A20 failed to cleave K63-linked tetraubi- 
quitin, wild-type A20 but not A20 OTU(C103A) purified from mam- 
malian cells cleaved K63-polyubiquitin (Fig. 2a). Similar results were 
obtained with the more physiological substrates K48- or K63-linked 
tetraubiquitin ligated to a single Lys residue of an HA-epitope-tagged 
RIPK1 peptide (Extended Data Fig. 6a, b). No A20 preparation showed 
significant cleavage of linear tetraubiquitin (Extended Data Fig. 6c). 
Thus A20 post-translational modifications or interactions unique to 
mammalian cells could account for the activity of A20 towards K63- 
linked ubiquitin. Mass spectrometry identified four conserved resi- 
dues phosphorylated on murine A20 and on human A20 expressed in 
mammalian cells, but not on human A20 expressed in bacteria: Ser381, 
Ser480, Ser565, and Thr625 (Extended Data Fig. 6d and Supplementary 
Information d, e). Notably, Ik K8-mediated phosphorylation of Ser381 
enhances downregulation of pro-inflammatory signalling by A20 by 
an unknown mechanism’. Alanine substitution of all four phospho- 
rylated residues or of Ser381 alone attenuated cleavage of K63-linked 
tetraubiquitin (Fig. 2b, Extended Data Fig. 6e). IK K8-mediated phos- 
phorylation of bacterially expressed A20 facilitated cleavage of K63- 
linked tetraubiquitin but not linear tetraubiquitin? (Fig. 2c, Extended 
Data Fig. 6f, g), and Ik K6-phosphorylated-wild type, but not OTU 
mutant A20, deubiquitinated TNFR1 and associated proteins (Fig. 2d). 
Our finding that phosphorylation promotes A20-mediated cleavage of 
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Figure 2 | A20 phosphorylation promotes hydrolysis of K63-polyubiquitin 
chains. a, Right, time course of K63- or K48-linked tetraubiquitin (Ub,) 
cleavage by wild-type or A20 OTU(C1034A) proteins purified from E. coli 
or HEK293T cells. Left, unreacted A20 input samples. UnRx, unreacted. 

b, Cleavage of K63-linked-Ub, by wild-type or $381A A20 purified from 
HEK293T cells. c, Cleavage efficacy of K63-linked-Uby, by E. coli-derived, 
IkK8-phosphorylated A20, IkK@ alone, or wild-type A20 alone. d, Time 
course of K63-linked-Uby by E. coli-expressed, Ik K8-phosphorylated 
wild-type or A20 OTU(C103A). Ubiquitinated TNFR1 substrate was 
purified from Flag-TNF-treated Tnfaip30/°" MEFs. Gel source data are 
in Supplementary Figs 2, 3. Data represent two to five biological replicates. 


K63-linked ubiquitin chains explains why A20 phosphorylation sup- 
presses inflammatory signalling’? and reveals a mechanism by which 
the activity of deubiquitinases for certain polyubiquitin chains can be 
modified by post-translational modifications. 


Linear polyubiquitin dictates the fate of A20-inactivated 
cells 

Our biochemical and cellular studies show that the A20 OTU domain 
depolymerizes K63-polyubiquitin chains (Fig. 2a—c, Extended Data 
Figs 5e, f and 6a, e) but not linear polyubiquitin (Extended Data Fig. 
6c, g). Linear chains could, however, modulate A20 function. For 
example, A20 inactivation enhances pro-survival MAPK and NFkB 
signalling and promotes expression of proteins that antagonize cell 
death”, yet A20 deficiency paradoxically sensitizes cells to apoptotic 
and necroptotic TNF-induced death”*° and A20 expression protects 
against TNF-induced apoptosis”°”!. However, the factors that deter- 
mine whether A20 inactivation fosters cellular survival or demise are 
unknown. Because attenuated linear ubiquitination favours disassembly 
of the proximal TNFR1 signalling complex and promotes cell death?”-?°, 
we reasoned that linear chains might regulate whether compromised 
A20 function promotes cell survival or death. Knockdown of haem- 
oxidized IRP2 ubiquitin ligase-1 interacting protein (HOIP), a com- 
ponent of the linear ubiquitin chain assembly complex (LUBAC)”’, 
reduced linear but not K63-linked ubiquitination of TNFR1 (Extended 
Data Fig. 7a) and, consistent with previous reports”>-*°, sensitized 
TNF-treated wild-type MEFs to death (Fig. 3a, Extended Data Fig. 7b). 
Importantly, Tnfaip30™°™ MEFs were significantly more sensitive than 
wild-type MEFs to killing by TNF after HOIP knockdown (Fig. 3a, b, 
Extended Data Fig. 7b, active caspase quantitation not shown). Thus, lin- 
ear ubiquitination dictates whether K63 hyperubiquitination resulting 
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from compromised A20 deubiquitinase activity promotes pro-inflam- 
matory and cell survival signalling (Extended Data Fig. 4d, f) or cell 
death (Fig. 3a, Extended Data Fig. 7b). Supporting this notion, com- 
promising TNFR1 K256 ubiquitination, that is primarily linear (Fig. 
1c, Extended Data Fig. 5d), enhanced TNF-induced caspase activation 
(Extended Data Fig. 7c). Interestingly, compromised TNFR1 K256 ubiq- 
uitination did not affect JNK, p38, and NFKB signalling (Extended Data 
Fig. 7d), suggesting that TNFRI ubiquitination regulates cell death 
whereas ubiquitination of TNFR1-associated proteins regulates 
downstream kinases. This idea corroborates a report that K63-linked 
ubiquitination of undefined TNFRI residues regulates cell death but 
does not affect downstream kinase activity”. However, in this scenario 
attenuating TNFR1 ubiquitination blocked cell death—thus additional 
characterization of how TNFR1 ubiquitin modifications regulate 
signalling is warranted. 

We profiled cellular signalling complexes to understand how A20 
deubiquitinase activity and linear ubiquitination collaborate to regulate 
cell death. Caspase 8 is an apical protease in the TNF-induced cell death 
cascade*! and its cleavage was detected earlier in Tnfaip3°”/O™ MEFs 
compared to wild-type MEFs treated with HOIP RNA interference 
(RNAi; Fig. 3b), in keeping with more robust recruitment of caspase 8 to 
activated TNFR] (Fig. 3c). Because engagement of TNFR1 is reported 
to promote recruitment of caspase 8 to a cytoplasmic “Complex II” 
rather than to TNFRI at the plasma membrane (Complex I)*?, 
we confirmed the specificity of our caspase 8 antibodies (Extended 
Data Fig. 7e) and analysed endogenous TNFRI signalling complexes 
by mass spectrometry. Full-length caspase 8 was detected within the 
TNERI signalling complex along with receptor internalization compo- 
nents (Extended Data Figs 5a and 7f, g, and not shown). Thus caspase 8 
may be recruited to TNFR1 complexes at the plasma membrane or to 
those that are endocytosed”; such recruitment is enhanced in cells with 
compromised A20 deubiquitinase activity and linear ubiquitination. 
The Fas-associated death domain (FADD) is an adaptor critical for 
assembly of both Complex I and Complex II*!*”, and cleaved caspase 
8 was most efficiently recruited to FADD in cells lacking both A20 
deubiquitinase activity and linear ubiquitination, as was recruitment 
of RIPK3 and ubiquitinated RIPK1 (Fig. 3d). The presence of TNFR1 
in anti-FADD immunoprecipitates substantiates interactions between 
membrane-bound and cytoplasmic cell death signalling components 
(Fig. 3d). Association of K63-ubiquitinated RIPK1 with the FADD- 
containing cell death complex was transient in Tnfaip30™°™ MEFs but 
more robust in Tnfaip30™/O™ MEFs after HOIP knockdown, coincid- 
ing with enhanced caspase activation (Extended Data Fig. 7h). Thus 
enhanced K63-linked polyubiquitination of TNFR1 complex proteins 
resulting from A20 deficiency (Extended Data Figs 5e, fand 7h) facili- 
tates assembly of either pro-survival or cell-death-inducing signalling 
complexes. Which complex prevails is dictated by linear ubiquitina- 
tion: sufficient linear ubiquitination preserves the architecture of the 
pro-survival signalling complex for enhanced TAK1 and also IKK8 
recruitment (Fig. 1a) and protracted downstream signalling (Extended 
Data Fig. 4d, f) whereas compromised linear ubiquitination favours 
association of TNFR1 and K63-hyperubiquitinated RIPK1 with cell 
death machinery and activation of cell death (Fig. 3, Extended Data 
Fig. 7b, c, f-h and diagram in Extended Data Fig. 1b, c). 


Linear ubiquitination prohibits A20 depolymerization of 
ubiquitin scaffolds 

Next, we investigated how linear ubiquitination preserves the architec- 
ture of the TNFR1 pro-survival signalling complex**-*8. A20 cannot 
depolymerize linear ubiquitin chains (Extended Data Figs 5f and 6c, 
g), thus linear ubiquitination could prohibit A20-mediated disassem- 
bly of the complex. Deubiquitinase profiling*’ confirmed that endog- 
enous TNFR1 and RIPK1 are modified by K63-linked chains, that 
are depolymerized by A20, and linear chains, that are depolymerized 
by OTULIN*** (Fig. 4a). The most complete deubiquitination was 
achieved by OTULIN and A20 (Fig. 4a). To evaluate whether linear 
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Figure 3 | A20 regulates TNF-induced cell death in collaboration 
with linear ubiquitination. a, Wild-type or Tnfaip3°/°™ MEFs 
transfected with control or HOIP RNAi oligonucleotides were treated 
with TNF. Ethidium homodimer-1-labelled dead cells were normalized 
to cell density. Mean values (n = 3) +s.e.m. b, wild-type or Tnfaip30”/O™ 
MEFs transfected with control or HOIP RNAi oligonucleotides 

were treated with TNF and cell lysates analysed by immunoblotting. 

c, Wild-type or Tnfaip3°"°™ control or HOIP RNAi-treated MEFs were 


chains prohibit A20 from deubiquitinating K63-ubiquitinated TNFR1 
signalling components, we polymerized linear ubiquitin upon a K63- 
tetraubiquitin chain that was ligated to an haemagglutinin-conjugated 
(HA)-RIPK1 peptide (Fig. 4b, Extended Data Fig. 6b). Whereas A20 
depolymerized ubiquitin from K63-ubiquitinated HA-RIPK1, negli- 
gible deubiquitination occurred when the substrate was modified with 
branched linear chains (Fig. 4c, Extended Data Fig. 6a). Thus linear 
ubiquitination preserves the architecture of signalling complexes by 
physically prohibiting A20-mediated dissolution of K63 polyubiqui- 
tin scaffolds. Importantly, Tnfaip3°™/°™ mice were also sensitized to 
lipopolysaccharide (LPS) challenge as reflected by enhanced lethality 
and elevated serum cytokines, and protracted MAPK and NF«B activa- 
tion with persistence of K63-polyubiquitinated TRAF6 in isolated cells 
(Extended Data Fig. 8a, b, d). These data differ from a previous report, 
perhaps owing to treatment levels of LPS used**. Given that branched 
K63/linear chains are assembled on activated interleukin-1-receptor 
signalling components*’, the mechanism by which linear ubiquitina- 
tion prohibits deubiquitination of K63-polyubiquitin scaffolds and pre- 
serves the architecture of signalling complexes may be more broadly 
applicable to other signalling complexes regulated by linear ubiquiti- 
nation and A20-like deubiquitinases**”. 


A20 ZnF4 attenuates TNF signalling 

Having investigated mechanisms by which A20 OTU activity regulates 
TNERI signalling, we next focused on the ZnF4 motif. A20 ZnF4 selec- 
tively binds K63-linked polyubiquitin chains’; nevertheless, both A20 
ZnF4 mutants and wild-type A20 were recruited similarly to TNFR1 
in Tnfaip34O49* and in Tnfaip34U"/4 cells (Fig. 5a, Extended 
Data Fig. 4b, c). This prompted us to define a recruitment mechanism 


treated with Flag~TNE, engaged receptor complexes and cell lysates 

were analysed by immunoblotting. UnRx, untreated. d, Lysates from 
TNF-treated WT or Tnfaip3°/°™ control or HOIP RNAi-treated MEFs 
were immunoprecipitated using anti-FADD antibody; immunoprecipitates 
and cell lysates were immunoblotted as indicated. HOIP knockdown was 
validated by PCR with reverse transcription (RT-PCR; not shown). 

Gel source data are in Supplementary Figs 3, 4. Data represent two to four 
biological replicates. 


independent of ZnF4. A20 ZnF4 Cys residues support the ZnF4 struc- 
ture’, and X-ray crystallography revealed that ZnF4 simultaneously 
binds to three monoubiquitins at ZnF4 sites I, I, and III (ref. 9). A20 
ZnF7 also binds ubiquitin*™*!, but the relative ubiquitin-binding 
affinities of ZnF4 and ZnF7 are unknown. We prepared nine individ- 
ual °N-labelled human A20 ZnF proteins to quantify and compare 
directly monoubiquitin binding using NMR spectroscopy. ZnF7 bound 
monoubiquitin with a similar Kp to the ZnF4 ubiquitin-binding sites, 
and mutations in key ubiquitin-binding residues’ ” suppressed bind- 
ing (Extended Data Fig. 9a, Supplementary Information f-h). ZnF1 
displayed negligible binding, thus ubiquitin binding is not a universal 
property of A20 zinc fingers (Extended Data Fig. 9a, Supplementary 
Information f). Because ZnF4 and ZnF7 monoubiquitin binding 
did not explain recruitment of A20 ZnF4(C609A,C612A) or A20 
ZnF4(Y599A,F600A) to TNFR1, we used biolayer interferometry to 
measure binding affinities of A20 ZnF motifs to tri-ubiquitin chains. 
A20 ZnEF7 bound to linear triubiquitin approximately 400 times more 
effectively than ZnF4 bound K63-linked triubiquitin (Fig. 5b, Extended 
Data Fig. 9a). Thus, in the absence of a functional ZnF4, ZnF7 is 
likely to direct A20 recruitment to the TNFR1 signalling complex via 
linear polyubiquitin binding. These data corroborate TNFAIP3 muta- 
tional analyses in haematologic malignancies* and a role for ZnF7 in 
attenuating TNF-induced apoptosis“*: ZnF7 mutants are probably not 
recruited to TNFR1, thus mimicking an A20-inactivated, TNF-sensitive 
phenotype. Supporting this idea, knockdown of HOIP reduced linear 
but not K63-linked ubiquitination of TNFR1 (Extended Data Fig. 7a) 
and attenuated A20 recruitment to TNFR1 (Fig. 5c). These data are in 
contrast to another study, which reported inefficient recruitment of 
murine A20 ZnF4(C609A,C612A) to activated TNFR1!°. However, 
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Figure 4 | Linear ubiquitination prohibits A20 disassembly of the 
TNFER1 signalling complex. a, Flag-TNF-purified elutions from 
Tnfaip30'/O™ MEFs were treated with recombinant deubiquitinases and 
immunoblotted as indicated. b, Immunoblot analysis of an HA-RIPK1 
peptide modified with a K63-linked-Uby, chain or subsequently modified 
with linear ubiquitin chains. c, Cleavage of K63-linked-Uby ligated to 

an HA-RIPK1 peptide with or without linear ubiquitination by E. coli- 
derived, IkK8-phosphorylated A20. Gel source data are in Supplementary 
Figs 4, 5. Data represent two to three biological replicates. 


the authors used anti-TNFR1 antibodies to isolate TNFR1, that do not 
effectively immunoprecipitate ubiquitinated TNFR1 (Extended Data 
Fig. 9b). Therefore, a significant fraction of the ubiquitinated and acti- 
vated TNFR1 complex could have been inadvertently excluded from 
their analysis. 

Effective A20 recruitment to activated TNFR1 is therefore insuffi- 
cient for homeostatic A20 activity—the functional integrity of ZnF4 
is also important. Although TNFRI1-associated RIPK1 was increased 
in Tnfaip340"40% and Tnfaip3*0""" cells (Fig. 1b, Extended 
Data Fig. 4c), K48-ubiquitinated RIPK1 associated with TNFRI was 
decreased (Extended Data Fig. 9c). Thus, a functional A20 ZnF4 
motif is required for K48-polyubiquitination of endogenous RIPK1. 
A20 ZnF4 also directed K48-linked polyubiquitination of TNFR1 
in vitro (Extended Data Fig. 9d). Because levels of RIPK1 and TNFR1 
are increased within the proximal TNFR1 signalling complex in 
Thfaip34O"9 and Tnfaip3*¥"’® cells (Fig. 1b, Extended Data 
Fig. 4c) and K48-ubiquitin chains direct proteasomal degradation®, our 
data are consistent with a role for A20 ZnF4 in promoting RIPK1 and 
TNFR1 K48-linked ubiquitination and subsequent degradation to limit 
TNE signalling. Because IkKK8-mediated phosphorylation enhanced 
A20 K63 deubiquitinase activity (Fig. 2b-d, Extended Data Fig. 6a, e) 
we evaluated whether phosphorylation enhanced A20 ubiquitin ligase 
activity. Ix K6-phosphorylated A20 promoted more autoubiquitina- 
tion than untreated A20 (Fig. 5d). IkK® alone had no ligase activity 
(Extended Data Fig. 9e) and mass spectrometry of recombinant IkK8 
did not reveal contaminating proteins of insect, viral or human ori- 
gin that could mediate ligase or deubiquitinase function (not shown). 
Although biochemical and structural studies report A20 ZnF4 ubiquitin 
ligase activity®”, it remains unclear whether in vitro A20 autoubiquit- 
ination is a phenomenon common to ubiquitin-binding motifs** or 
whether A20 ZnF4 has ubiquitin ligase function: the ability to transfer 
ubiquitin from a charged E2 enzyme to a substrate Lys residue and 
form polyubiquitin chains‘. IkK8-phosphorylated A20, but not Ik 
KB alone, transferred ubiquitin onto the Lys residue in the HA-RIPK1 
peptide to generate polyubiquitinated substrate (Fig. 5e, Extended Data 
Fig. 6b). Thus the A20 ZnF4 motif has ubiquitin ligase activity that 
is enhanced by IkK3-phosphorylation and is critical for attenuating 
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Figure 5 | A20 ZnF4 ubiquitin-binding is required for attenuating TNF 
signalling. a, Analysis of TNFR1-associated A20 from Flag~TNF-treated 
wild-type or Tnfaip340"/40% MEFs. b, Equilibrium binding curves of 
human A20 ZnFs with Ub trimer (Ub3). Average response values 

(n= 3) + standard deviations. c, Wild-type MEFs transfected with control 
or HOIP RNAi oligonucleotides were treated with Flag~TNF; anti-Flag 
immunocomplexes or cell lysates were analysed by immunoblotting. 

d, Ubiquitination efficacy of E. coli-derived wild-type A20 or IKK 
-phosphorylated A20 with or without E2 UBCHSA. e, E. coli-derived, 
IkK8-phosphorylated A20 promotes polyubiquitination of HA-RIPK1 
peptide. Gel source data are in Supplementary Fig. 5. Data represent two to 
four biological replicates. 


TNFRI signalling. Intriguingly, A20 ZnF4 substrates and pathways 
affected by ZnF4 inactivation are selective: in contrast to Tnfaip3°™/0™ 
mice, Tnfaip340%% and Tnfaip340"4” mice were unaffected by LPS 
challenge (Extended Data Fig. 8c, d). Accordingly, cellular signalling 
and TRAF6 ubiquitination in Tnfaip3“0"“ cells remained unper- 
turbed by LPS relative to wild-type cells (Extended Data Figs 1a, 8e). 


Discussion 

Our studies have characterized physiological consequences of A20 
OTU and ZnF4 domain inactivation. Because inactivating mutations 
in the OTU or ZnF4 domains are hypomorphic, simultaneous inac- 
tivation of OTU and ZnF4 domains, and/or the ZnF7 motif, may be 
required to fully incapacitate A20 and phenocopy Tnfaip3 '~ mice. We 
find that A20 phosphorylation promotes K63-linked ubiquitin chain 
cleavage by the OTU domain and enhances ubiquitin transfer by the 
ZnF4 motif. Additional studies are required to understand how phos- 
phorylation enhances A20 ubiquitin editing. While OTU and ZnF4 
domains direct opposite enzymatic functions, TNF treatment yields 
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similar signalling profiles in A20 OTU- or ZnF4-domain mutant cells 
via two distinct mechanisms—more K63-ubiquitinated proteins accu- 
mulate in Tnfaip3°™O™ cells owing to insufficient deubiquitination, 
whereas insufficient degradation of K63-ubiquitinated proteins leads 
to their accumulation in Tnfaip3“O"40% and Tnfaip310"4 cells 
(Extended Data Fig. 1a). 

We also show that TNFRI ubiquitination and turnover are regu- 
lated by A20 OTU and ZnF4 domains. Stabilization of ligand-engaged 
TNERI could explain why TNFAIP3 mutations are correlated with 
patient responses to anti-TNF therapies“: elevated levels of activated 
TNFR1 are a liability that can be exploited by TNF-neutralizing agents. 
Our finding that linear ubiquitination dictates whether A20 insuffi- 
ciency enhances inflammatory signalling or cell death reveals an addi- 
tional mechanism by which A20 regulates pathogenesis. Both outcomes 
can promote local or systemic inflammation in vivo??-*8, underscor- 
ing why inactivating TNFAIP3 mutations are associated with critical 
inflammatory and autoimmune syndromes’. 


Online Content Methods, along with any additional Extended Data display items and 
Source Data, are available in the online version of the paper; references unique to 
these sections appear only in the online paper. 
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METHODS 

Generation of mice carrying the Tnfaip3 C103A, C609A/C612A, or Y599A/ 
F600A knock-in alleles. The three constructs for targeting the Tnfaip3 locus in 
embryonic stem (ES) cells were made using recombineering and/or standard 
molecular cloning techniques. The Tnfaip3 C103A knock-in (KI) construct cor- 
responds to the following genomic position (all coordinates from the NCBI37/ 
mm49 assembly, complementary strand): chr10: 18725681-18733460. The C103A 
mutation (TGC to GCC) is in exon 3, and a loxP-Neo-loxP cassette was inserted 
in a position corresponding to chr10:18729002. The Tnfaip3 C609A/C612A KI 
construct corresponds to chr10:18722957-18730583. The C609A/C612A double 
mutation (TGCACTCTATGT to GCTACTCTAGCT) is in exon 7, and a loxP-Neo- 
loxP cassette was inserted in a position corresponding to chr10: 18726067. The 
Tnfaip3 Y599A/F600A KI construct corresponds to chr10: 18721314-18727086, 
the Y599A/F600A double mutation (TATTTT to GCTGCA) is located in exon 7, 
and an Frt-Neo-Frt selection cassette was inserted at a position corresponding to 
chr10: 18724150. The vectors were confirmed by DNA sequencing, linearized and 
used to target C57BL/6 C2 ES cells using standard methods (G418 positive and 
ganciclovir negative selection). Positive clones were identified using Southern blot 
and/or PCR, quantitative PCR, and sequencing. Correctly targeted ES cells were 
transfected with a Cre or Flpe plasmid, respectively, to remove the Neo selection 
marker and to create ES cells with the final Tnfaip3 KI alleles. KI ES cells were 
injected into blastocysts using standard techniques, and germline transmission was 
obtained after crossing resulting chimaeras with C57BL/6N females. 
Genotyping wild-type and Tnfaip3 KI mice. Genomic DNA was extracted from 
tails using the Qiagen DNeasy Blood and Tissue Kit according to the manufac- 
turer’s instructions. Genomic DNA was amplified using the Invitrogen GeneAmp 
Fast PCR Master Mix following standard procedures. 

Primer sequences and PCR products are as follows: 

A20 OTU(C103A): Expected PCR product sizes: WT, 321 bp; KILMUT, 383 bp. 

Reverse primer: CCTCCAGTGCATTCTGAGGAATCTC. 

Forward primer: AAGCATGCACGATGAAGGAGC. 

A20 ZnF4(C609A,C612A): Expected PCR product sizes: WT, 738 bp; KILMUT, 
873 bp. 

Reverse primer: GCCTTGCACAGGGATCTCCAT. 

Forward primer: CACTCTCATGGTGTCCTTCTGAGATG. 

A20 ZnF4(Y599A,F600A): Expected PCR product sizes: WT, 420 bp; KI.MUT, 
454 bp. 

Primer1: TCTCACTCCACACTCTTG. 

Primer2: TTCAGACCGAAGTTCCTAT. 

Primer3: T@GGCTACATAATGGGTTTA. 

In vivo cytokine challenge studies and data analysis. All staff participating in 
animal work abided by the laws and regulations as stated in the Animal Welfare 
Act and The Guide for the Care and Use of Laboratory Animals. This study 
did not unnecessarily duplicate previous experiments. Alternatives to the use 
of animals for this study were considered and none existed or were acceptable. 
All scientists and technicians were trained in the proper procedures for animal 
handling, animal techniques, the administration of anaesthesia and analgesia, 
and the methods of euthanasia used in these studies. All animal study protocols 
were reviewed and approved by the Genentech Institutional Animal Care and 
Use Committee (LACUC), and mice were generated and housed in a clean rodent 
facility on-site at Genentech Dixon and South San Francisco campuses in standard 
rodent micro-isolator cages. Sample size was generated based on historical data 
and knowledge of variability within the cytokine challenge models. Animals were 
selected based on body weight, age, and genotype and subsequently randomized 
based on body weight, with animals of extremes of body weight or age excluded 
from the study. The investigators were not blinded to allocation during experi- 
ments and outcome assessment. 

For TNF challenge studies, 300 1g murine TNF per kg body weight (Genentech, 
Inc.) in PBS was injected intravenously. Female C57b16 mice between 2-4 months 
of age were used for all studies with Tnfaip30™/°™ and Tnfaip340"9* animals. 
For studies with Tnfaip30"% and Tnfaip34""" animals, 9-week old male 
C57b16 mice were used and dosing of TNF or PBS was done in a sequential 
manner by alternating wild-type and mutant mice and the samples are collected 
accordingly. Serum was collected at the indicated time points from three mice 
each of the indicated genotype for Luminex multiplex cytokine analysis. The 
rectal temperatures of 3-4 mice per genotype were also recorded at the indicated 
time points. Dosing of TNF or PBS was blinded and data were collected in a 
blinded manner. 

LPS (Sigma L3012) in PBS was administered by intraperitoneal injection for 
challenge studies. Mice were challenged with 20 mg or 40 mg LPS per kg body 
weight for mortality studies, and both Log Rank (Mantel-Cox) and Wilcoxon tests 
were used to calculate P values. 


For high-dose LPS challenge studies, serum was collected at the indicated time 
points from three mice each of the indicated genotype for Luminex multiplex 
cytokine analysis in response to 40 mg LPS per kg body weight. For low dose LPS 
challenge studies 4 or 5 mice per genotype were injected with PBS or 5mg LPS 
per kg body weight in PBS. Serum was collected at 2h or 6h post-injection for 
Luminex multiplex cytokine analysis. A two-tailed Student's t-test was used to 
calculate P values. 

Antibodies and reagents. Antibodies to the indicated proteins were purchased 
from the specified vendors, with catalogue or clone numbers indicated in brack- 
ets. Anti-FADD (Ab52935), anti-caspase 8 (Ab138485) (AbCam); anti-caspase 8 
(ALX804448-C100) (Enzo); anti-RIPK1 (610459) (BD Biosciences); biotinylated 
anti-murine TNFR1 (BAF425), hamster anti-murine TNFR1 (55R170), anti-TAK1 
(491840), anti-IKKG3 (725818) (R&D systems); anti-A20 (5630), anti-IkBa (9242), 
anti-phospho-IkBa (5A5), anti-JNK (56G8), anti-phospho-JNK (81E11), anti-p38 
(9212), anti-phospho-p38 (D3F9) anti-phospho-MKK3 (12280), anti-total MKK3 
(8535), anti-phospho-MKK4 (4514), anti-total MKK4 (9152), anti-mouse-specific 
caspase 8 (4927) anti-cleaved caspase 8 (8592), anti-human caspase 8 (9746), anti- 
cleaved caspase 3 (9664), anti-human caspase 3 (9662), anti-PARP (9541) (Cell 
Signaling Technology); anti-3-tubulin (clone DM1B) (MP Biomedicals); anti- 
RIPK3 (NBP1-77299) (Novus); anti- TRAF2 (sc-7346), anti- TRAF6 (sc-7221), 
anti-actin-HRP (sc-1616), anti-ubiquitin and ubiquitin-HRP (clone P4D1), mouse 
anti-human TNER1 (sc-8436), anti- TAK1 (sc-7162) (Santa Cruz Biotechnology); 
anti-HA-HRP (clone HA-7), anti-TRADD (SAB44503461), anti- RNF31 (HOIP) 
(SAB2102031), anti-Flag M2 affinity gel (A2220) (Sigma). Anti-K11 polyubiquitin, 
anti-K48 polyubiquitin, anti-K63 polyubiquitin, anti-linear polyubiquitin, anti- 
FADD (9274) and non-production grade Trastuzumab antibodies were produced 
at Genentech. Recombinant human TNF was produced at Genentech. Tri-ubiquitin 
and tetra-ubiquitin chains of indicated linkages were generated at Genentech. 
Recombinant human Flag-TNF was purchased from Enzo Life Sciences or was 
produced at Genentech. LPS was purchased from Sigma and recombinant murine 
GM-CSF and M-CSF were purchased from R&D systems. 

Cell culture. MEFs were generated from E14 embryos and in some cases were 
immortalized by retroviral transduction with pWZL-hygro EIA (a kind gift 
from Scott Lowe). Tnfaip3-'~ MEFs were a kind gift from Averil Ma. Primary 
and immortalized MEFs were cultured in FMA medium (DMEM supplemented 
with 10% heat inactivated fetal bovine serum, 100\.M non-essential amino 
acids (Invitrogen), 50\1M 2-mercaptoethanol, 1% penicillin/streptomycin, 
and 1% L-glutamine). A20 macrophage progenitor cells were generated by 
isolating bone marrow cells from the femurs and tibias of 6-8-week-old mice, 
followed by erythrocyte lysis and enrichment using a murine haematopoietic 
progenitor cell enrichment kit as directed by the manufacturer (19756; Stem Cell 
Technologies). Cells were then immortalized using conditional HoxB8. Progenitor 
cells were cultured in RPMI-1640 medium supplemented with 10% heat- 
inactivated fetal bovine serum, 2mM t-glutamine, 20ngml~' GM-CSF and 
1M £-oestradiol. Differentiation to macrophages was by removal of 3-oestradiol. 
Primary BMDMs were generated by isolating the bone marrow cells from mice as 
described above and were cultured in FMA media supplemented with 20 ng ml“! 
M-CSF for 7 days. Primary and immortalized cells were characterized by genotyp- 
ing and western blotting as described and tested for mycoplasma contamination. 
HEK293T cells were authenticated following Genentech’s “Guidelines for 
Maintaining the Integrity of Cell Line Stocks” as described previously’ and were 
cultured in DMEM supplemented with 10% fetal bovine serum, 1% penicillin/ 
streptomycin, and 1% L-glutamine. 

In vitro cytokine activation treatments. For signalling pathway profiling studies 
MEFs or BMDM (primary or matured from HoxB8-ER immortalized progenitors) 
were treated with 20ngml~’ human TNF or 100ngmlI LPS for 30-45 min to 
induce A20 expression, washed three times with PBS, and cultured in FMA or 
RPMI medium until collection at the indicated time points. Alternatively primary 
BMDMsand MEFs were treated with 20ng ml! to 1g ml~! LPS or human TNF 
(with or without the Flag epitope fusion, as indicated) until collection at the indi- 
cated time points. For analysis of signalling complexes cells were treated with 20 11M 
MG-132 immediately before collection (untreated controls) or immediately before 
treatment with TNF or Flag-TNF as indicated, or 10 pgm? LPS, and collected at 
the indicated time points. 

Western blot analysis and immunoprecipitations. Cells were treated as described 
and lysed TNFR1 lysis buffer (20 mM Tris pH 7.5, 150mM NaCl, 1% Triton X-100, 
1mM EDTA, 50mM Naf, 10mM N-ethyl maleimide, complete protease inhibi- 
tor tablets (Roche), phosphatase inhibitor cocktail-1 and -3 (Sigma), and 25 1M 
MG-132) containing 6 M urea. Lysates were reduced and alkylated and processed 
as previously described®. To evaluate TRAF6 ubiquitination status, cells were first 
treated as detailed above and lysed in LPS lysis buffer (20 mM HEPES pH 7.6, 
150mM NaCl, 1.5mM MgCh, 2mM EDTA, 0.5% Triton X-100, 10 mM NaF, 2mM 
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DTT, 10mM N-ethyl maleimide, complete protease inhibitor tablets (Roche), 
Phosphatase inhibitor cocktail-1 and -3 (Sigma), and 25 1M MG- 132) containing 
6M urea at the indicated time points. Protein concentrations were quantified and 
200 1g was reserved for western blot analysis. Normalized lysates from each sample 
were subsequently diluted to 4M urea with LPS IP buffer and samples were pre- 
cleared with protein A beads and 21g per mg total protein non-production grade 
Trastuzumab. Pre-cleared lysates were subsequently immunoprecipitated with 
11g anti-K63 antibody per mg total protein overnight, immunocomplexes were 
captured with 10,11 Protein A beads (Sigma) per mg total protein, washed with 
LPS IP buffer, and processed for western blot analysis. To evaluate RIPK1 or 
TNFR1 ubiquitination status, cells were first treated as detailed above using TNF 
or Flag-TNF and lysed in TNEFR1 lysis buffer at the indicated time points. Protein 
concentrations were quantified and 200 1g was reserved for western blot analysis. 
Normalized lysates from each sample were pre-cleared with protein A+G beads 
(Pierce) or with anti-IgG beads (Sigma). Pre-cleared lysates were immunoprecip- 
itated with 51g anti-murine TNFR1 antibody pre-coupled to 50,11 protein A+ 
G beads or with anti-Flag beads (Sigma), washed, and the immunoprecipitates 
were dissociated from the beads with 6 M urea. The eluted proteins were subse- 
quently diluted to 4M urea for anti-K63 ubiquitin IPs, or remained undiluted for 
anti-linear ubiquitin IPs, in ubiquitin chain lysis buffer (20 mM Tris-Cl pH 7.5, 
135mM NaCl, 1.5mM MgCh, 1% Triton X-100, 1mM EGTA, 10% glycerol, 50mM 
NaF, 10mM N-ethyl maleimide, complete protease inhibitor tablets (Roche), 
Phosphatase inhibitor cocktail-1 and -3 (Sigma), and 251M MG-132) and pre- 
cleared with protein A beads and 21g per mg total protein non-production grade 
Trastuzumab, followed by immunoprecipitation with 1 1g anti-K63 or anti-linear 
antibody per mg total protein overnight, and immunocomplexes were captured 
with 1011 Protein A beads (Sigma) per mg total protein. Immunoprecipitates were 
then washed and processed for western blot analysis. For analysis of Flag~TNF 
activated TNFRI signalling complexes, cells were first treated as described above 
for the indicated time points. Cells were immediately washed with PBS and lysed at 
4°C in TNFR] lysis buffer and in some cases stored at —80°C. Lysates were cleared 
by centrifugation, precleared with mouse IgG agarose (Sigma), and normalized 
amounts of lysates were immunoprecipitated with anti-Flag affinity gel (Sigma) 
overnight. Immunoprecipitates were washed once with wash buffer #1 (20 mM 
HEPES pH 7.9, 420mM NaCl, 1.5mM MgCh, 0.2mM EDTA, 25% glycerol, 
complete protease inhibitor cocktail) and four times with wash buffer #2 (20 mM 
Tris pH 7.4, 20% glycerol, 0.2mM EDTA, 300mM NaCl, 0.1% NP-40, complete 
protease inhibitor cocktail), rotating at least 10 min for each wash in buffer #2. 
Samples were eluted with 500 1g per ml 3 x Flag peptide (Sigma), concentrated, 
and prepared for western blot analysis. For analysis of FADD-associated signalling 
complexes, cells were treated with RNAi oligonucleotides as outlined in the ‘RNAi 
treatments and transfections’ section and treated with TNF as outlined above. 
Cells were immediately washed with PBS and lysed at 4°C in TNFR1 lysis buffer 
as described above containing 10\1M Z-VAD (R&D). Lysates were cleared by 
centrifugation, protein concentrations were quantified and 200\1g was reserved 
for western blot analysis. Normalized amounts of lysates were precleared with 
Protein G beads (Sigma) and immunoprecipitated with anti-FADD antibody 
(Genentech) overnight. Immunocomplexes were captured with Protein G affin- 
ity gel matrix (Sigma), washed with TNFR1 IP buffer, and processed for western 
blot analysis. For analysis of K63-polyubiquitinated RIPK1 associated with FADD, 
cells were treated and lysates were prepared and immunoprecipitated with anti- 
FADD antibody and Protein G affinity gel matrix as outlined above. The washed 
immunoprecipitates were dissociated from the beads with 6 M urea. The eluted 
proteins were subsequently diluted to 4 M urea in ubiquitin chain lysis buffer and 
pre-cleared with protein A beads and 2\.g per mg total protein non-production 
grade Trastuzumab, followed by immunoprecipitation with 11g anti-K63 anti- 
body per mg total protein overnight, and immunocomplexes were captured with 
101 Protein A beads (Sigma) per mg total protein. Immunoprecipitates were 
then washed and processed for western blot analysis. For immunoprecipitations 
using anti-linkage-specific ubiquitin antibodies, cells were treated as indicated 
and lysed in ubiquitin lysis buffer with 6 M urea. Normalized lysates were either 
undiluted (for anti-linear ubiquitin immunoprecipitations) or diluted to 4M urea 
(for anti-K11, anti-K48, or anti-K63 ubiquitin immunoprecipitations). Anti-K63 
immunoprecipitations were performed as described above, 2 1g anti-K11 antibody 
per mg of total protein was used for anti-K11 immunoprecipitations, and 51g 
anti-K48 ubiquitin antibody (Genentech) + 1:g anti K48 ubiquitin antibody 
(CST) per mg total protein were used for anti-K48 immunoprecipitations. Samples 
were incubated with antibodies overnight at 4°C and captured with protein A 
beads and processed as above. 

Induction of myelin oligodendrocyte glycoprotein (MOG) peptide 35-55 EAE. 
To select animals for the study, age and gender matching between genotypes was 
performed. Animals were used based on their genotype information and thus 
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could not be randomized. Sample size was generated based on historical data and 
knowledge of variability within the EAE model. 

EAE was induced in 8-12 week old female wild-type A20, A20 OTU(C103A), 
or A20 ZnF4(C609A,C612A) mice. Briefly, animals were injected subcutaneously 
on the back with a 200 tl emulsification of 300 1g MOG peptide in incomplete 
Freund’s adjuvant supplemented with 8 mg ml~' Mycobacterium tuberculosis 
(Difco Laboratories, Detroit, MI). The final dose of Mycobacterium tuberculosis 
per mouse was 800}1g. Clinical scoring of disease was performed 3 times per week 
starting at day 9. Cages were marked with genotype information that, while not 
referenced during scoring, were present and not blinded to the investigator. 

Animals were assessed based on a five point system: 0 = no clinical disease, 
1=loss of tail tone only, 2= mild monoparesis or paraparesis, 3 = severe parapare- 
sis, 4= paraplegia and/or quadraparesis, and 5= moribund or death. Average daily 
clinical score (ADCS) was calculated as the area under the curve (AUC) value for 
clinical score from day of disease onset to end of study divided by the duration in 
days. Animals were identified by numbers and the investigators scoring the animals 
were blinded to the genotype/number correlations. 

EAE specimen preparation and histopathology analysis. Spines and brain were 
collected at terminal euthanasia and fixed in 10% neutral buffered formalin. Spines 
were decalcified until trimmable. CNS was evaluated in 4 coronal sections (brain) 
and 5 to 8 transverse sections (spinal cord) stained with haematoxylin, eosin and 
Luxol fast blue. Lesion severity was scored on an arbitrary scale of 0 to 3 and 
reported as averages of all sections scored per animal. 

Mass spectrometric identification of ubiquitinated TNFR1. Profiling of ubiquit- 
ination sites were performed using PTMscan protocol (Cell Signaling Technology, 
Danvers, MA). Briefly, wild-type and OTU mutant MEF cells were treated with 
TNFa for 15 min and immediately stopped by adding 9 M urea lysis buffer (20 mM 
HEPES, pH 8.0, 9M urea, 1mM sodium vanadate, 2.5mM sodium pyrophos- 
phate, 1 mM beta-glycerophosphate). Lysates were reduced with dithiothreitol, 
alkylated with iodoacetamide and digested with trypsin or chymotrypsin-+trypsin 
combo at room temperature for overnight. Resultant peptides were desalted on 
Sep-pak C18 cartridges (Waters, Milford, MA) and lyophilized for two days. 
Immunoprecipitation of ubiquitinated peptides was performed using anti-K-e-GG 
antibody (Cell Signaling Technology, Danvers, MA). Peptides were eluted with 
0.15% TFA, desalted, and further separated into 5 fractions with high pH reverse 
phase on a STAGE tip. All peptides were analysed with a NanoAcquity UPLC 
system (Waters, Milford, MA) directly coupled to an LTQ Orbitrap Elite mass 
spectrometer (Thermo Scientific, San Jose, CA). Peptides were reconstituted in 
0.1% formic acid (FA) with 2% acetonitrile (ACN), loaded onto a Symmetry C18 
column (1.7mm BEH-130, 0.1 x 100mm, Waters) and separated with a 60-min 
gradient from 0% to 15%, 0% to 20%, or 2% to 25% solvent B (0.1% FA, 98% ACN) 
at 1,.1min“! flow rate. Peptides were eluted directly into the mass spectrometer 
with a spray voltage of 1.2kV. Full MS data were acquired in FT for 375-1,600 m/z 
with a 60,000-resolution. The 15 most abundant ions found in the full MS were 
selected for MS/MS through a 2-Da isolation window. 

Acquired MS/MS spectra were searched using the Mascot (Matrix Sciences, 
London, UK) with trypsin or trypsin+-chymotrypsin enzyme specificity. Search 
criteria included a full MS tolerance of 50 ppm, MS/MS tolerance of 0.8 Da with 
oxidation (+15.9949 Da) of methionine and ubiquitination (+114.0429 Da) of 
lysine as variable modifications and carbamidomethylation (+-57.0215 Da) of 
cysteine as static modifications. Data were searched against the mouse and contam- 
inant subset of the Uniprot database that consists of the reverse protein sequences. 
Identified TNF-R1 ubiquitinated peptide-spectrum-matchings (PSMs) were man- 
ually validated. Additional isotopically-labelled peptides (AQUA) correspond- 
ing to the identified TNF-R1 ubiquitinated sequences were synthesized by Cell 
Signaling Technologies (Danvers, MA) to confirm the endogenous ubiquitination. 
Abundance of ubiquitinated peptides was determined by adding equal amount of 
AQUA peptides to the wild type and mutant MEF samples before immunoprecip- 
itation, followed by quantifying area under curve for the corresponding spiked-in 
and endogenous peptides. 

In vitro deubiquitinase reactions. Recombinant full length A20, with or with- 
out the indicated amino acid mutations, was expressed in E. coli and purified as 
described previously*”. Recombinant Flag-tagged full length A20, with or with- 
out the indicated amino acid mutations, was expressed in HEK-293T cells and 
purified as described previously’. The input of all A20 proteins for in vitro deu- 
biquitinase reactions was normalized by estimation of the protein concentrations 
on Coomassie blue-stained gels using serial dilutions of bovine serum albumin 
as standards. E. coli-derived A20 was phosphorylated with recombinant IK KS 
(Proqinase) in the following reaction: 51g A20, 1.75 jug GST-IK KB, 101M ATP, 
25mM Tris pH 7.5, 5mM {-glycerolphosphate, 1mM DTT, 0.1 mM Na3VOu,, 
10mM MgCh, 0.5% phosphatase inhibitor cocktail-3. A20 deubiquitination 
reactions were performed using 100 ng recombinant A20, 500 ng of the indicated 
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ubiquitin chain (unconjugated or conjugated to HA epitope-tagged RIPK1 peptide, 
see below), and DUB reaction buffer (50 mM HEPES pH 8.0, 0.01% Brij-35, and 
3mM DTT) with or without phosphatase inhibitor cocktail and were incubated 
for the indicated times at 37 °C with agitation at 1,000 rpm. Ubiquitinated TNFR1 
or RIPK1 substrate was purified from Flag~TNF-treated Tnfaip30”/°™ MEFs as 
outlined above. OTULIN deubiquitinase reactions were performed following the 
recommended UbiCREST protocol conditions (Boston Biochem). Samples were 
subsequently processed for western blot analysis as described above and immuno- 
blotted as indicated. 

Preparation of polyubiquitinated RIPK1 peptide. The sequence of the HA- 
RIPK1 peptide is as follows: 

YPYDVPDYASLEHPQEENEPSLQSKLQDEANYHLYGSRMDRQT-amide. 

The peptide was prepared at Genentech on a Protein Technologies Symphony 
automated synthesizer using standard Fmoc chemistry protocols on a Fmoc 
Rink amide linker attached to TentaGel resin. Peptides were cleaved off the solid 
support with trifluoroacetic acid: triisopropylsilane: water (95:2.5:2.5) for 2h at 
room temperature. Trifluroacetic acid was evaporated and peptides were pre- 
cipitated with ethyl ether, extracted with acetic acid, acetonitrile and water and 
lyophilized. Crude peptides were solubilized in dimethyl sulfoxide and purified by 
reverse phase chromatography on a C18 column using acetonitrile/water buffers. 
Purified fractions were analysed by liquid chromatography mass spectrometry 
(PE-Sciex), pooled and lyophilized. The calculated and the found mass were both 
5111.3 Da. K48-linked or K63-linked tetraubiquitin chains were then ligated to 
the peptide using the following reaction mix: 30|1M tetraubiquitin, 0.5,.M El 
(Boston Biochem), 541M cdc34 (for K48 tetraubiquitin, Genentech) or 2.5 41M 
each UEV1 and UbcH13 (for K63 tetraubiquitin, Genentech), 50 mM Tris pH 
8.0 (Genentech), 10 mM ATP (Sigma), 10 mM MgCl, (Genentech), and 6mM 
DTT (Sigma). Reactions were incubated with agitation for 4h at 30°C and then 
overnight at 27.5 °C. Tetraubiquitin-conjugated peptides were purified and concen- 
trated using 30 kDa cutoff microspin columns (Amicon Ultra). K63-linked tetra- 
ubiquitin chains ligated to the HA-peptide were subsequently modified with linear 
ubiquitin chains using 301M linear tetraubiquitin (Genentech), 511M UbcH5c 
(Boston Biochem), 11M His6-MBP HOIP catalytic domain (Boston Biochem), 
50 mM Tris pH 8.0 (Genentech), 10mM ATP (Sigma), 10mM MgCl (Genentech), 
and 6mM DTT (Sigma). Reactions were incubated with agitation for 4h at 30°C 
and then overnight at 27.5°C. Polyubiquitin-conjugated peptides were purified and 
concentrated using 30 kDa cutoff microspin columns (Amicon Ultra). 
Identification of A20 phosphorylation sites by mass spectrometric analysis. 
Human and murine A20 proteins were reduced in sample buffer containing 
DTT (Sigma, St Louis, MO) and alkylated in 0.176 M n-isopropyliodoacetamide 
(synthesized in house, mass addition of 99.0684 Da to all cysteine residues). 
Samples were separated by SDS-PAGE and stained. Bands around 90kDa in each 
lane were excised and digested with trypsin as previously described”. After over- 
night digestion the peptides were extracted from gel slices in acetonitrile and evap- 
orated to near dryness. Samples were then reconstituted in 10 il of 0.1% formic acid 
containing 2% acetonitrile and analysed by LC-MS/MS. Reconstituted peptides 
were injected via an auto-sampler onto a 75,1m x 100mm column (BEH, 1.7 11M, 
Waters Corp, Milford, MA) at a flow rate of 1 ,lmin7! using a NanoAcquity UPLC 
(Waters Corp, Milford, MA). A gradient from 98% Solvent A (water + 0.1% for- 
mic acid) to 80% Solvent B (acetonitrile + 0.08% formic acid) was applied over 
45 min. Samples were analysed on-line via nanospray ionization into a hybrid 
LTQ-Orbitrap Velos mass spectrometer (Thermo, San Jose, CA). Data were col- 
lected in data dependent mode with the parent ion analysed in the FTMS and 
the top 15 most abundant ions selected for fragmentation in the LTQ. Tandem 
mass spectrometric data were analysed using the search algorithm Mascot (Matrix 
Sciences, London, UK). Searches were performed against the UniProt database 
with a parent ion tolerance less than 50 ppm, fixed carbamylation (NIPCAM 
on Cys) and variable oxidation (Met) and phosphorylation (Ser, Thr, or Tyr). 
Phosphorylation sites were localized by de novo interpretation of the spectra and 
using Ascore (Harvard University, Cambridge, MA) as previously described”. 
RNAi treatments and transfections. HEK293T cells were transfected with three 
On Target Plus TNFRSFA1 siRNA SMARTpool (Thermo Scientific) oligonucle- 
otides using RNAi Max transfection reagent (Life Technologies) according to the 
manufacturer's instructions. The RNAi sequences are: 

Oligo 1: sense =CAAAGGAACCUACUUGUACUU. 

Oligo 2: sense = GAGCUUGAAGGAACUACUAUU. 

Oligo 4: sense = UCCAAGCUCUACUCCAUUGUU. 

MEFs were transfected with On Target Plus RNF31 siRNA SMARTpool 
(Thermo Scientific) using RNAi Max transfection reagent (Life Technologies) 
according to the manufacturer's instructions. The RNAi sequences are: 

Oligo 1: sense = GCUGCAAGGUGCCGGGAAU. 

Oligo 2: sense = GCUAAGAGAGAGCGUUGAA. 


Oligo 3: sense = GCCAAGAUAAGAUGCGGAA. 

Oligo 4: sense = GGCAUUGACUGUCCGAAAU. 

To validate sufficient knockdown, lysates were probed using anti-HOIP anti- 
bodies and/or transcript levels were evaluated for HOIP expression. To this end, 
RNA was isolated from transfected cells and Taqman probes (Life Technologies) 
were used in a 1-step RT-PCR reaction (HOIP: Mm01313902_m1 and GAPDH: 
Mm99999915_g1). Relative quantification of transcript was determined by 
comparing normalized C, values. 

To introduce a K256R point mutation in wild-type murine TNFR1 (mTNFR1) 
the following primers were used in a QuikChange mutagenesis reaction: 

mTNFRK256R forward primer: TGT AGG GAT CCC GTG CCT GTC AGA 
GAG GAG AAG GCT GGA AAG. 

mTNFRK256R reverse primer: CTT TCC AGC CTT CTC CTC TCT GAC 
AGG CAC GGG ATC CCT ACA. 

The resulting constructs were then sequenced to confirm the successful 
mutagenesis and a second PCR reaction was carried out using the primers below 
to add and an HA tag on the N terminus of both wild-type and mutated mINFR1. 

mTNER Xbal reverse primer: CGG TCT AGA TTA TCG CGG GAG GCG 
GGT CGT. 

The resulting PCR products and pCDNA3.1 vector were digested with EcoRI 
and XbaI restriction enzymes and then ligated together. Plasmids were sequenced 
to confirm the presence of the HA tag. 

Live cell imaging experiments. For measurement of cell viability and caspase 
activity, 2,500 MEFs that were previously transfected with the indicated siRNA 
oligonucleotides were seeded per well of a 96-well plate (Corning catalogue # 3904) 
in media containing 2|1M CellEvent Caspase 3/7 reagent (Life Technologies cat- 
alogue # C10423). The following day, the cells were treated with TNF and imme- 
diately placed in an IncuCyte live cell imager. Images were taken every 2h using 
a 10x objective. All treatments were done in triplicate. Phase contrast was used 
to measure cell confluency/density while green fluorescence was used to measure 
caspase activity. The images were analysed using IncuCyte software (Basic Analysis 
parameters) and a ratio of caspase activity to cell density was determined. The area 
under the curve (AUC) for caspase activity/cell density was determined for each 
treatment. Student t-tests were used to measure statistical significance and error 
bars were calculated to indicate s.e.m.. To measure cell death, 0.1 1M ethidium 
homodimer-1 (Life Technologies catalogue # E1169) was added to the media before 
treatment with TNE. Cells were monitored using IncuCyte software as indicated 
above. Dead cells were counted based on the intensity of red fluorescence. Red 
counts were normalized to cell density. The area under the curve (AUC) for dead 
cells/cell density was determined for each treatment. Student t-tests were used to 
measure statistical significance and error bars were calculated to indicate s.e.m. 
Identification of Flag-TNF-associated proteins. Cells were treated with recom- 
binant Flag-TNF for the indicated times and samples prepared as described in 
the ‘Western blot analysis and immunoprecipitations’ section. The TNF receptor 
complex was immunoprecipitated through anti-Flag antibody conjugated agarose 
gel (Sigma). Proteins were eluted off beads using 3 x Flag peptide (Sigma) and 
concentrated with a 10K cutoff membrane filter. Eluents were mixed with LDS 
sample buffer (Invitrogen), reduced with dithiothreitol (DTT), alkylated with 
Iodoacetamide and resolved by a 3-8% Tris acetate gel. Each lane was cut into 
5 bands and subjected to in-gel digestion. Briefly, each gel band was digested with 
trypsin overnight at 37°C in the presence of 25mM AMBIC at pH 8.0 for extrac- 
tion. Peptides were further extracted by 10% acetonitrile with 0.1% trifluoroacetic 
acid. Extracts were combined and dried under vacuum. Samples were reconstituted 
in 2% acetonitrile with 0.1% formic acid and loaded onto a 0.1 x 100 mm analytical 
column packed with 1.7 1m BEH-130 C18 using a NanoAcquity UPLC (Waters). 
Peptides were eluted with a 60 min gradient from 2% to 25% acetonitrile at 
11min“ flow rate and directly introduced to an LTQ-Orbitrap Elite mass spec- 
trometer (ThermoFisher Scientific) through an ADVANCE electrospray ionization 
source (Michrom BioResources/Bruker, Auburn, CA). Full MS data were acquired 
in orbitrap at 60,000 resolutions. The top 15 most abundant precursors from the 
proceeding full MS spectrum were further selected for CID fragmentation and 
MS/MS spectra were acquired in ion trap. MS/MS data was searched using the 
Mascot (version 2.3, Matrix Sciences, London, UK). Search criteria included a full 
MS tolerance of 50 ppm, MS/MS tolerance of 0.8 Da with oxidation (+15.9949 Da) 
of methionine and ubiquitination (+114.0429 Da) of lysine as variable modifica- 
tions and carbamidomethylation (+57.0215 Da) of cysteine as static modifications 
with up to 3 missed cleavages. Data was searched against the Mus musculus and 
contaminant subset of the Uniprot database that consists of the reverse protein 
sequences. Data was then filtered using linear discriminator analysis (LDA) at 
peptide level to 10% false discovery rate (FDR). Further cutoff was applied to 
the whole data set at the protein level to 5% FDR, which resulted in 0.6% FDR 
at peptide level. 


© 2015 Macmillan Publishers Limited. All rights reserved 


Cloning, expression and purification of A20 ZnF proteins. The human A20 
ZnF1 (K386-S453), ZnF4 (S592-K635), ZnF7 (P758—G790) domains and mutants 
(ZnF4(C624A,C627A); ZnF4(K606E,1629R); ZnF4(1629R); ZnF4(K606E); and 
ZnF7(F770A,G771A)) were cloned into the EitNTH-NAvi vector with N-terminal 
His6 tag, Avi tag and TEV cleavage site and further transformed into BL21-Gold 
(DE3) E. coli strain. Expression of !SN-labelled proteins for NMR studies was 
carried out in M9 media at 16°C for approximately 20h using 0.4mM IPTG induc- 
tion. All biotinylated proteins were expressed in the same E. coli strain. The purifi- 
cation of all proteins was carried out at 4°C using Ni-NTA resin (Qiagen) followed 
by protease cleavage and another Ni?" affinity chromatography to remove the 
purification tag. Proteins were further purified by size exclusion chromatography 
(Superdex 75) using buffer consisting of 20 mM Tris (pH 8.3), 300 mM NaCl, and 
1mM Tris (2-carboxyethyl) phosphine (TCEP). 

NMR experiments and data analysis. NMR experiments were performed at 25°C 
on a Bruker 500 MHz spectrometer. All NMR samples were prepared in buffer 
containing 20 mM MES (pH 6.0), 150 mM NaCl, 0.5 mM TCEP and 10% (v/v) D0 
with concentration 0.1-0.13 mM of A20 ZnF motifs. Mono-ubiquitin was added to 
the A20 ZnF motifs at 0 to 10 ratios using a 11.1 mM stock of mono-ubiquitin. The 
data were analysed and Kp values were determined using Sparky and NMRViewJ 
software, respectively. 

In vitro A20 zinc finger ubiquitin-binding assays. Binding of A20 ZnF motifs 
to Ub trimers of varying linkages was measured by biolayer interferometry on an 
Octet RED384 instrument (ForteBio, Inc.). Biotinylated ZnF1, ZnF4 and ZnF7 
variants were captured onto streptavidin SA biosensors. Unbound material was 
washed away with binding buffer (20mM MES pH 6.0, 150mM NaCl, 10% glyc- 
erol, 0.2mM DTT, 0.01% Tween-20 and 0.1 mg ml~! human serum albumin) 
before conducting association and dissociation measurements with trimeric Ub 
analytes. The binding reactions displayed rapid saturation behaviour and thus were 
ideally evaluated by equilibrium binding analysis; however, because trimeric ubiq- 
uitin chains possess multiple binding sites for the ZnF motifs, these measurements 
were encumbered by surface-dependent avidity. These binding curves displayed 
multiphasic behaviour — the true binding phases were complicated by additional, 
artificially high affinity phases, in which adjacent ZnF molecules affixed to the tip 
surface engaged the same ubiquitin trimer in an avid interaction. In order to avoid 
these artefacts, we identified conditions in which avid interactions were minimized 
as follows: using linear triubiquitin binding ZnF7 as a test case, we performed 
titrations at a range of ZnF7 loading densities, plotted equilibrium response values 
(in nm) as a function of linear ubiquitin trimer concentration, and fit the curves 
to a modified two-site binding equation: 


R = Rimax x {[(1 = Navid) / (Kp + [A])] + [Mavid /(Kp-avid + [A])]} 


in which R is the response value (in nm), Rmax is the maximal response, ayia is the 
relative fraction of avid interactions, [A] is the total concentration of Ub trimer, 
Kp is the equilibrium dissociation constant for the non-avid interactions, and Kp.avid 
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is the apparent equilibrium dissociation constant for the avid interactions. This 
equation reflects the assumptions that avid interactions have a much stronger 
affinity than non-avid interactions, and the stochastic placement of ZnF7 mol- 
ecules on the tip surface ensures that some non-avid interactions will occur 
regardless of loading density. We plotted the resulting n-avid values as a func- 
tion of ZnF7 loading density to determine the ZnF loading range in which 
the fraction of avid interactions reached a minimal plateau. This occurred at a 
loading density at or below 0.12 nm, and so all subsequent measurements were 
conducted using this loading density. Once avid interactions were minimized, 
the equilibrium response data, in most cases, was well fit by a simple, 1:1 binding 
equation: 


R= Rmax X [A]/(Kp + [A]), 


where R is the response value, Rmax is the maximal response, [A] is the total Ub 
trimer analyte concentration, and Kp is the equilibrium dissociation constant. The 
exception was ZnF7 binding to linear Ub trimer. Because ZnF7 has two binding 
sites for ubiquitin, these data were fit to a standard 2-site binding model: 


R= [Rmax1 x [A]/(Kp, + [A])] 7 [Rmax2 x [A]/(Kp2. + [A])] 


Here each of the two equilibrium constants (Kp; and Kp) has a separate Rmnax 

value (Rmaxi and Rmax2, respectively). All data were fit using Kaleidagraph version 
4.03 (Synergy Software). 
In vitro ubiquitination assays. E. coli-derived A20 was phosphorylated with 
recombinant IkK8 (Proqinase, Active Motif or Life Technologies) in the follow- 
ing reaction: up to 541g A20, up to 1.75 j.g GST-IkK8, 10}.M ATP, 25mM Tris 
pH 7.5, 5mM 6-glycerolphosphate, 1 mM DTT, 0.1 mM Na3VO,, 10mM MgCh, 
0.5% phosphatase inhibitor cocktail-3. A20 ubiquitination reactions were per- 
formed as previously described$ using UbcH5a or UbcH7 as E2 enzymes (Boston 
Biochem), Flag—-wild-type A20 or Flag~A20 ZnF4(C609A,C612A) expressed in 
HEK293T cells and purified by Flag peptide elution, or E. coli-derived A20 with or 
without IkK8 phosphorylation as described above were used as ubiquitin ligases, 
and the HA-RIPK1 peptide or recombinant murine TNFR1 (Genentech) were 
used as substrates. For assessment of TNFR1 K48-ubiquitination in vitro, after 
ubiquitination reactions were complete 6 M urea was added to each reaction for 
15min at room temperature with agitation to dissociate proteins. The urea was 
diluted to 4 M and the reactions were immunoprecipitated with 5 1g anti-K48 ubiq- 
uitin antibody (Genentech) + 1 \.g anti K48 ubiquitin antibody (CST) overnight at 
4°C. The immunocomplexes were captured with protein A beads and processed 
as above for western blot analysis. 


49. Yu, M. et al. A resource for cell line authentication, annotation and quality 
control. Nature 520, 307-311 (2015). 

50. Wertz, |. E. et al. Sensitivity to antitubulin chemotherapeutics is regulated by 
MCL1 and FBW7. Nature 471, 110-114 (2011). 
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Extended Data Figure 1 | See next page for caption. 
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Extended Data Figure 1 | A model for A20 OTU and A20 ZnF4 
regulation of TNF- and LPS-activated signalling. a, Left complex. Upon 
TNF binding TNFRI1 forms a trimer, thereby promoting recruitment 

of the adaptor protein TRADD and the RIP1 kinase (RIPK1). TRADD 
recruits TRAF2, TRAF5 and the ubiquitin ligases cCIAP1 and cIAP2. 

The cIAP proteins promote K63-linked ubiquitination of signalling 
proteins including RIPK1, cIAP1/2 (autoubiquitination), and possibly 
TNERL. K63 ubiquitination of cLAP1/2 subsequently recruits the LUBAC 
complex that promotes linear polyubiquitination of signalling proteins 
including RIPK1 and TNFRI1. K63 ubiquitin chains on RIPK1 promote 
recruitment of the TAK1/TAB2/3 complex, whereas linear ubiquitin 
chains on RIPK1 promote IKK kinase complex recruitment via NEMO. 
Kinase complex recruitment promotes their subsequent activation and 
propagation of downstream JNK, p38 (via MKK3 and MKK4) and NFKB 
signalling pathways. We propose that A20 is recruited to the active TNFR1 
signalling complex via ZnF7 binding to linear ubiquitin chains. The A20 
OTU domain catalytic C103 is essential for attenuating TNF-activated 
signalling by removing K63 polyubiquitin chains from RIPK1 and other 
proteins including TNFR1, thereby promoting the dissociation of the 
active signalling complex. The A20 ZnF4 motif, that depends on C609/ 
C612 for structural integrity and Y599/F600 for ubiquitin binding, is likely 
to collaborate with other proteins (not shown) to further downregulate 
TNE signalling by directing K48 polyubiquitination and subsequent 
degradation of proximal complex proteins, including RIPK1 and TNFR1. 
Right complex. LPS binding activates TLR4 and promotes the assembly 
of proximal signalling complexes via the adaptors TRIF and TRAM 
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(not shown) or Mal and MyD88. Recruitment and activation of the 
proximal kinases IRAK4 and IRAK1, the ubiquitin ligase Pellino 

(not shown), and the LUBAC complex promote K63 and linear 
polyubiquitination of signalling proteins. As with TNFRI signalling, this 
scaffolding-type ubiquitination promotes recruitment of TAK1/TAB2/3 
and IkK kinase complexes, their subsequent activation, and propagation 
of downstream JNK, p38 and NF«B signalling pathways. A20 is probably 
recruited to the LPS-activated signalling complex via ZnF7 binding to 
linear ubiquitin chains. The A20 OTU domain catalytic C103 is essential 
for attenuating LPS-activated signalling by removing K63 polyubiquitin 
chains from TRAF6, and possibly other proximal signalling proteins. 
Although the structural integrity and the ubiquitin-binding function of 
A20 ZnF4 is dispensable for proper attenuation of TLR4 signalling, A20 
ZnF4 could have a redundant function with another protein. b, In A20 
OTU(C1034A) cells removal of K63 ubiquitin chains on proximal signalling 
components is compromised, thus proteins are hyperubiquitinated 

with K63-linked chains. With sufficient linear ubiquitination, the 
infrastructure of the signalling complex is sustained, caspase recruitment 
to TNFR1 is prohibited, and pro-survival signalling is enhanced. 

c, In A20 OUT(C103A) cells with deficient linear ubiquitination, removal 
of K63 ubiquitin chains is still compromised; however, decreased linear 
chains favours enhanced association of hyperubiquitinated RIPK1 

with FADD and caspase 8, the proximal components of the pro-death 
complex. Enhanced caspase 8 recruitment and activation in turn activates 
downstream effector caspases (such as caspase 3 and 7), culminating in 
cell death. 
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Extended Data Figure 2 | Engineering and genotyping of A20 OTU allele encoding A20 ZnF4(C609A,C612A) and representative genotyping 
mutant and A20 ZnF4 mutant knock-in mice. a, Schematic diagrams of data (right panel). d, Tnfaip3*#0"4 knock-in allele encoding A20 

the A20 protein indicating the locations of the knock-in point mutations ZnF4(Y599A,F600A) and representative genotyping data (right panel). 
for each engineered mouse strain. The gene and protein names are also b-d, Correctly targeted ES cell clones were identified by long-range PCR 
indicated, with abbreviated protein names indicated in parentheses. followed by sequencing (data not shown). LoxP sites are illustrated as 

b, Tnfaip30™O™ knock-in allele encoding A20 OTU(C103A) and yellow arrows, frt sites as red arrows. Modified exons 3 and 7 are indicated 
representative genotyping data (right panel). c, Tafaip3“0"/4* knock-in in blue. For gel source data, see Supplementary Fig. 6. 


© 2015 Macmillan Publishers Limited. All rights reserved 


ARTICLE 


% S eg 
he Ccl4 
8 a 
@ 35 st2 
5 Csf3 
= Cxcl1 
s IFNy 
@ 30 ILto 
£ IL1B 

ae IL2 
2 IL3 
B 05 15 
{>} WT A20 he 
a -@ZnF4 Cys IL10 
~t- OTU C103A 1L12p40 
20 IL12p70 
o 1 2 3 4 = 
Time after TNF treatment (hours 
eel SSSSHSSHAEES 
o +r fC oO ry Cf oO or 
rr o - o = Oo = oO 
EFookEoCROESCO ESS 
=e > i< > < 5) < 
a) ae) an) Eo 
o 8 Oo 8 o 8 os 
C3) é) (3) re) 
+ + + t+ }]-1012 
re re ra ha value 
— ES es 
untreated 1h TNF 2h TNF 4h TNF 

Cc 40 d ele 
5 ces 
9 Sei 
@ 30 sf2 
= Csf3 
2 Cxcl1 
5: fa 
E 30 IL1B, 

E IL2 
= IL5 
SB 254 WT A20 (ZnF4 Cys cont.) re 
8 -© WT A20 (ZnF4 Ub cont.) IL12p40 
1L12p70 
IL13 
IL17 
i¢} 1 2 3 4 sS see <> eo a 
gx s sx s 
Time after TNF treatment (hours) S85 a 8 a S85 Q $ a 
se 88 £ f se 88 f 8 
8% Ss §$ ~€ 83 85 = ¢S 
of 3 S 3 of 8 8 3 
g 8S => & 9 8S = & 
s g 5 rz z Z 5 g 5 zg z Z value 
N 3S N N FN 35 N WN 
4h PBS-treated 4h TNF-treated 


3 OTU C103A 


EAE clinical score 


0 5 10 15 20 25 30 
ZnF4 C609A,C612A 


EAE clinical score 


0 5 10 15 20 25 30 
Days post-immunization 
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Extended Data Figure 3 | Analysis of TNF-challenged A20 

wild-type, A20 OTU(C103A), A20 ZnF4(C609A,C612A) and A20 
ZnF4(Y599A,F600A) mice. a, Body temperatures of mice in response 

to 300 j1g TNF per kg body weight treatment. Error bars are indicated for 
each data point and represent the mean + standard deviation of 3 or 4 mice 
per genotype. b, A heat map representing profiles of serum cytokines in 

12 different genotype/TNF treatment groups. Mice (n= 3 or 4 per group) 
were treated for the indicated time with 300 1g TNF per kg body weight; 
mean values per group are represented in the heat map. Each row 
represents one cytokine, whose values were standardized to z-scores with a 
mean of zero and a standard deviation of 1, and colour-coded according to 
the colour key. Variances of selected serum cytokines from A20 wild-type, 
A20 OTU(C103A) (OTU), or A20 ZnF4(C609A,C612A) (ZnF4 Cys) mice 
in response to TNF stimulation were evaluated using the Student's t-test: 
IL6 WT versus ZnF4 Cys 2 h P=0.009, 4h P=0.030; IL6 WT versus OTU 
2h P=0.023, 4h P=0.035; Cxcll WT versus ZnF4 Cys 2h P=0.011, 

4h P=0.017; Cxcll WT versus OTU 2h P=0.047, 4h P= 0.043; Csf3 WT 
versus ZnF4 Cys 4h P=0.017, Csf3 WT versus OTU 4h P=0.042; Ccl11 
WT versus ZnF4 Cys 4h P= 0.0003, Ccll1 WT versus OTU 4 h P=0.036. 
c, Body temperatures of ZnF4 mutant mice in response to 300 1g TNF per kg 
body weight treatment. Error bars are indicated for each data point and 
represent the mean + standard deviation of 4 mice per genotype. d, A heat 
map representing profiles of serum cytokines as in Extended Data Figure 3b, 
but the indicated mice (n = 4 per group) were treated for four hours 

with 300j1g TNF per kg body weight or PBS vehicle control. Variances of 


selected serum cytokines from A20 ZnF4(C609A,C612A) (ZnF4 Cys), A20 
ZnF4(Y599A,F600A) (ZnF4 Ub) or the respective wild-type control mice 
in response to TNF stimulation were evaluated using the Student's t-test: 
IL6 WT versus ZnF4 Cys P= 0.000057; IL6 WT versus ZnF4 Ub P=0.014; 
Cxcll WT versus ZnF4 Cys P= 0.016; Cxcll WT versus ZnF4 Ub 
P=0.012; Csf3 WT versus ZnF4 Cys P=0.024, Csf3 WT versus ZnF4 Ub 
P=0.0061; Ccll11 WT versus ZnF4 Cys P= 0.005, Ccl11 WT versus ZnF4 
Ub P=0.001. e, Analysis of myelin oligodendrocyte glycoprotein-induced 
experimental autoimmune encephalomyelitis (MOG-EAE) studies in A20 
WT, A20 OTU(C103A), and A20 ZnF4(C609A,C612A) mice. Top panel, 
MOG-EAE disease scores over time (mean +s.e.m.) for A20 WT (n= 15) 
and A20 OTU(C103A) (n= 14). A20 OTU(C103A) average daily clinical 
scores (ADCS) P=0.012, Dunnett’s test versus A20 WT. Bottom panel, 
EAE disease scores over time (mean +s.e.m.) for A20 WT (n= 13) and 
A20 ZnF4(C609A,C612A) (n= 12). A20 ZnF4(C609A,C612A) ADCS 
P=0.046, Dunnett’s test versus A20 WT. f, Lower power (upper panel A) 
and higher power (lower panel B) microscopic images of a representative 
lumbar spinal cord section derived from a A20 ZnF4(C609A,C612A) 
mouse with a grade 3 EAE clinical score at study termination (day 30). 
The section is stained with haematoxylin, eosin and Luxol fast blue. 

A, Foci of myelinopathy and gliosis (arrows). Gr, grey matter; Wh, white 
matter; scale bar, 200 1m. B, Focally severe myelinopathy and gliosis 
(double arrows) extending from the meninges close to the grey matter 
(delineated by white dashed line). Scale bar, 50|1m. Data represent at least 
two biological replicates. 
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Extended Data Figure 4 | A20 proteins from wild-type, A20 
OTU(C103A) cells, A20 ZnF4(C609A,C612A) cells, and A20 
ZnF4(Y599A,F600A) cells are efficiently recruited to TNFR1 and 
regulate downstream signalling. a, Immunoblot analysis Flag—TNF- 
engaged immunocomplexes and the corresponding whole-cell lysates in 
wild-type and A20 OTU(C103A) MEFs. b, Immunoblot analysis 
Flag~TNF-engaged immunocomplexes and the corresponding whole-cell 
lysates in wild-type and A20 ZnF4(C609A,C612A) MEFs. c, Immunoblot 
analysis Flag~TNF-engaged immunocomplexes and the corresponding 
whole-cell lysates in wild-type and A20 ZnF4(Y599A,F600A) MEFs. 

d, Immunoblot analysis Flag—TNF-treated whole-cell lysates in wild-type 
and in A20 OTU(C103A) MEFs. Asterisk, background band; arrow, 
phospho-MKK4. UnRx, untreated. e, Immunoblot analysis Flag~TNF- 
treated whole-cell lysates in wild-type and in A20 ZnF4(C609A,C612A) 
MEFs. Asterisk, background band; arrow, phospho-MKK4. UnRx, 
untreated. f, Immunoblot analysis of whole-cell lysates from TNF-treated 


wild-type and A20 OTU(C103A) MEFs. Immunoblot analysis of whole- 
cell lysates from TNF-treated wild-type, A20 OTU(C103A), and A20 null 
MEFs following TNF pre-treatment to induce A20 expression, as well as 
TNF-treated A20 wild-type and A20 OTU(C103A) primary BMDMs, and 
TNF-treated A20 wild-type and A20 OTU(C103A) immortalized BMDMs 
all showed similar trends (not shown). g, Immunoblot analysis of whole- 
cell lysates from TNF-treated wild-type and A20 ZnF4 C609,612A E1A 
transformed MEFs. UnRx, untreated. Immunoblot analysis of whole-cell 
lysates from TNF-treated wild-type and A20 ZnF4 C609,612A MEFs 
following TNF pre-treatment to induce A20 expression, and analysis 

of whole-cell lysates from TNF-treated A20 wild-type and A20 ZnF4 
C609,612A primary BMDMs all showed similar trends (not shown). 

h, Immunoblot analysis of whole-cell lysates from TNF-treated wild-type 
and A20 ZnF4(Y599A,F600A) primary MEFs. For gel source data, see 
Supplementary Figs 6, 7. Data represent two to four biological replicates. 
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Extended Data Figure 5 | Additional analysis of ubiquitination 

status analysis of TNFR1 and associated proteins. a, A summary table 
of selected proteins identified in anti-Flag immunocomplexes from 
untreated or in Flag~TNF-treated wild-type MEFs by LC-MS/MS analysis 
(left columns) and a summary of the ubiquitination sites identified on 
the indicated proteins from TNF-treated A20 wild-type MEFs with 

the PTMscan approach using anti-K-e-GG antibodies and LC-MS/MS 
(right column). b, Analysis of TNF-engaged TNFR1 in untreated or 
Flag-TNF-treated E1 A-transformed A20 wild-type or A20 OTU(C103A) 
MEFs, or in A20 null primary MEFs. Anti-Flag immunocomplexes were 
purified using Flag peptide elution and elutions were blotted for TNFR1. 
Immunoblots of the corresponding whole-cell lysates are indicated 
below. c, Lysates corresponding with Fig. 1c. d, Murine TNFR1(K256R) 
attenuates TNFR1 ubiquitination and downstream signalling. Murine 
wild-type or TNFR1(K256R) was transfected in human 293T cells 

and cells were treated with Flag-TNF as indicated. Equal inputs of 
lysates were immunoprecipitated with anti-Flag, dissociated and 


re-immunoprecipitated with anti-linear ubiquitin antibody, and blotted 
for murine TNFRI, or lysates were blotted with the indicated antibodies. 

e, Analysis of TNFR1-associated RIPK1 K63 ubiquitination (Ub) status 
and TNFR1 immunoprecipitates in Flag~TNF-treated wild-type and A20 
OTU(C103A) MEFs. f, Comparison of activated TNFR1 ubiquitination 
status in E1A transformed A20 wild-type and OTU(C103A) MEFs. 
Treated cells were lysed in buffer containing 6 M urea and 
immunoprecipitated with the indicated antibodies under denaturing 
conditions. g, Summary table of RIPK1 ubiquitination sites identified from 
TNF-treated A20 wild-type and OTU(C103A) MEFs with the PTMscan 
approach using anti-K-e-GG antibodies and LC-MS/MS. Peptides were 
quantified with area under curve (AUC) and summarized to site level. 

The equivalent human RIPK1 residues are also indicated. The average 
ratio of endogenous RIPK1 ubiquitination sites in A20 OTU(C103A): 
wild-type A20 is 1.7. Additional TNFR1 mass spectrometry data are shown 
in Supplementary Information a-c. For gel source data, see Supplementary 
Figs 7, 8. Data represent two to four biological replicates. 
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Extended Data Figure 6 | See next page for caption. 
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Extended Data Figure 6 | In vitro deubiquitination assays. Additional 
data corresponding to Fig. 2. Normalized A20 WT or OTU(C1034A) inputs 
purified from E. coli or mammalian HEK 293T cells are shown in Fig. 2a. 
a, A20-mediated cleavage efficacy of K63- or K48-linked tetraubiquitin 
conjugated to a HA-tagged RIPK1 peptide. b, Sequence of the HA 
epitope-tagged human RIPK1 peptide. The HA epitope tag is shown in 
blue, the human RIPK1 residues in black, and K377 is highlighted in red. 
c, Cleavage time course of linear tetraubiquitin by purified A20 WT or 
OTU(C103A) from E. coli or from mammalian HEK 293T cells. Input 
protein levels are shown in Fig. 2a. d, A schematic of the human A20 
protein indicating where the phosphorylation sites are localized. Mass 
spectrometry PhosphoSite analysis (http://www.phosphosite.org/) of 


A20 derived from mammalian expression systems is shown in 
Supplementary Information d. e, Comparison of the cleavage efficacy of 
K63-linked tetraubiquitin with increasing doses of human wild-type A20 
or phospho-site mutant A20 (4x phos mut). 4x phos mut: S381A, S480A, 
S565A, and T625A. Wild-type or phos mut A20 proteins were expressed 
in and purified from mammalian HEK 293T cells. f, Tandem mass 
spectrum for the $381-containing peptide from human A20 expressed 

in E. coli and phosphorylated with recombinant IKK@. g, Cleavage efficacy 
of linear tetraubiquitin chains by increasing doses of E. coli-derived 
wild-type A20, IxK@ alone, or In K68-phosphorylated A20. For gel source 
data, see Supplementary Figs 8, 9. Data represent two to five biological 
replicates. 
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Extended Data Figure 7 | See next page for caption. 
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Extended Data Figure 7 | Effects of linear ubiquitination in modulating 
TNFRI signalling and cell viability. a, HOIP RNAi decreases linear 
ubiquitination of TNFR1. Wild-type MEFs were transfected with control 
or HOIP RNAi oligonucleotides and treated for the indicated times 

with TNE. Treated cells were lysed in buffer containing 6 M urea and 
immunoprecipitated with the indicated antibodies under denaturing 
conditions. Immunoprecipitates and whole-cell lysates were blotted as 
indicated. b, Area under the curve (AUC) data corresponding to cell 

death data in Fig. 3a. All error bars are s.e.m. for technical triplicates. 
*** P< 0.001 determined by t-test. c, Murine TNFR1(K256R) enhances 
caspase activation. Human HEK 293T cells were treated with control or 
with human TNFR1 RNAi oligonucleotides and transfected with murine 
wild-type or with TNFR1(K256R) as indicated. Cells were treated with 
TNF as indicated and equal inputs of lysates were immunoblotted with the 
indicated antibodies. d, Murine TNFR1(K256R) does not modulate MAPK 
or NF-«B signalling. Human HEK 293T cells were treated with control or 
with human TNFR1 RNAi oligonucleotides and transfected with murine 
wild-type or with TNFR1(K256R) as indicated. Cells were treated with 


TNF as indicated and equal inputs of lysates were immunoblotted with 
the indicated antibodies. e, Evaluation of the specificity of anti-caspase 

8 antibodies. E1 A-transformed MEFs of the indicated genotype were 
lysed in buffer containing 6 M urea, quantified, and immunoblotted with 
the indicated antibodies as detailed in the Methods. f, A summary table 
of cell death proteins identified in anti-Flag immunocomplexes from 
untreated or in Flag—-TNF-treated wild-type MEFs by LC-MS/MS analysis 
(also see Supplementary Fig. 6a). g, Sequences of the caspase 8 peptides 
in anti-Flag immunocomplexes from untreated or in Flag—~TNF-treated 
wild-type MEFs by LC-MS/MS analysis. h, Left panels, analysis of FADD 
immunoprecipitates and FADD-associated RIPK1 K63 ubiquitination 
(Ub) status in TNF-treated wild-type and A20 OTU(C103A) MEFs 
transfected with control- or HOIP RNAi oligonucleotides. Right panels, 
immunoblot analysis of whole-cell lysates from TNF-treated wild-type 
and A20 OTU(C103A) MEFs transfected with control- or HOIP RNAi 
oligonucleotides. HOIP knockdown was validated by RT-PCR analysis 
(data not shown). For gel source data, see Supplementary Figs 9, 10. 
Data represent two to three biological replicates. 
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Extended Data Figure 8 | See next page for caption. 
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Extended Data Figure 8 | A20 OTU domain, but not the ZnF4 motif, 
downmodulates LPS signalling. a, Kaplan-Meier survival curves of A20 
WT (n=15) and A20 OTU(C103A) (n= 15) mice in response to 20 mg 
LPS per kg body weight. Log rank P= 0.0002, Wilcoxon P < 0.0001. 

b, Upper panel, analysis of TRAF6 K63 ubiquitination (Ub) status in 
LPS-treated WT and A20 OTU(C103A) primary BMDMs. Lower panels, 
immunoblot analysis of whole-cell lysates from LPS-treated wild-type 
and A20 OTU(C103A) primary BMDMs. Asterisk, background band. 
Similar trends were seen in wild-type and A20 OTU(C103A) MEFs in 
response to acute LPS treatment and following LPS pre-treatment to 
induce A20 expression (not shown). ¢, Kaplan-Meier survival curves 

of A20 wild-type (n = 10), A20 ZnF4(C609A,C612A) (n= 10), and 

A20 ZnF4(Y599A,F600A) (n= 9) mice in response to 20 mg LPS 

per kg body weight. Log rank P= 0.1531, Wilcoxon P= 0.1398 for 

A20 ZnF4(C609A,C612A) versus WT; Log rank P= 0.4103, Wilcoxon 
P=0.3373 for A20 ZnF4(Y599A,F600A) versus WT. d, A heat map 
representing profiles of serum cytokines in 12 different genotype/LPS 
treatment groups. Mice (n =3 or 4 per group) were treated for the 
indicated time with 40 mg LPS per kg body weight LPS as indicated; 
mean values per group are represented in the heat map. Each row 


represents one cytokine, whose values were standardized to z-scores with a 
mean of zero and a standard deviation of 1, and colour-coded according to 
the colour key. Variances of selected serum cytokines from A20 WT, 

A20 OTU(C103A) (OTU), or A20 ZnF4(C609A,C612A) (ZnF4 Cys) 

mice in response to LPS stimulation were evaluated using the Student's 
t-test: TNF WT versus OTU 2h P=0.044; IFNy WT versus OTU 4h 
P=0.033; Ccl4 WT versus OTU 4h P=0.040. Profiles of selected serum 
cytokines from A20 WT (n=5), ZnF4 Cys (n= 4), or OTU(C103A) 
(n=5) mice in response to PBS or low dose (5 mg LPS per kg body 
weight) LPS and collected at 2h or 6h post-stimulation showed similarly 
significant variances between A20 WT and A20 OTU(C103A) (OTU) 

but not between A20 WT and A20 ZnF4(C609A,C612A) (ZnF4 Cys) 

(not shown). e, Upper panel, immunoblot analysis of whole-cell lysates 
from LPS-treated wild-type A20 and A20 ZnF4 C609,612A MEFs. 

Lower panel, analysis of TRAF6 K63 ubiquitination (Ub) status in the 
corresponding MEFs. Similar trends were seen in lysates from LPS-treated 
wild-type and A20 ZnF4 C609,612A MEFs following LPS pre-treatment 
to induce A20 expression (not shown). For gel source data, see 
Supplementary Fig. 10. Data represent two biological replicates. 
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Extended Data Figure 9 | See next page for caption. 
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Extended Data Figure 9 | Characterization of A20 ZnF mutants and 
their cellular effects. a, Summary of binding data of wild-type human 
A20 ZnF motifs and ZnF mutants to mono-ubiquitin, as measured by 
NMR, and to tri-ubiquitin chains, as measured by biolayer interferometry. 
Data are shown in Fig. 5b and Supplementary Information h, g. 

b, Analysis of TNF-engaged TNFR1 in untreated or TNF-treated E1A 
transformed A20 wild-type or A20 OTU(C103A) MEFs. Anti- TNF 
immunocomplexes were captured using anti- TNFR1 antibody-coupled 
beads and elutions were blotted for TNFR1. c, Analysis of TNFR1- 
associated RIPK1 K48 ubiquitination (Ub) status in TNF-treated 
wild-type and A20 ZnF4(C609A,C612A) MEFs. Immunoblot analysis 


of the corresponding whole-cell lysates are shown in the lower panels. 

d, Flag-wild-type A20 ubiquitinates recombinant murine TNFR1 with K48 
chains. Flag—wild-type A20 or Flag~A20 ZnF4(C609A,C612A) proteins 
purified from HEK 293T lysates were added to in vitro reactions with 
recombinant murine TNFR1 and ubiquitin system enzymes. Reactions 
were immunoprecipitated in 4 M urea using an anti-K48 ubiquitin 
antibody, and immunoblotted, or reaction inputs were blotted as indicated. 
e, Ik KG-phosphorylated A20, but not IkK@ alone, promotes in vitro 
ubiquitination. For gel source data, see Supplementary Fig. 11. Data 
represent two to three biological replicates. 
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A large-scale dynamo and magnetoturbulence in 
rapidly rotating core-collapse supernovae 


Philipp Mosta!?, Christian D. Ott!, David Radice!, Luke F. Roberts!, Erik Schnetter?+° & Roland Haas° 


Magnetohydrodynamic turbulence is important in many high- 
energy astrophysical systems, where instabilities can amplify the 
local magnetic field over very short timescales’. Specifically, the 
magnetorotational instability and dynamo action? have been 
suggested as a mechanism for the growth of magnetar-strength 
magnetic fields (of 101° gauss and above) and for powering 
the explosion’~"° of a rotating massive star'!!?. Such stars are 
candidate progenitors of type Ic-bl hypernovae!*!*, which make 
up all supernovae that are connected to long +-ray bursts!*!°, The 
magnetorotational instability has been studied with local high- 
resolution shearing-box simulations in three dimensions!”~"!, and 
with global two-dimensional simulations”, but it is not known 
whether turbulence driven by this instability can result in the 
creation of a large-scale, ordered and dynamically relevant field. 
Here we report results from global, three-dimensional, general- 
relativistic magnetohydrodynamic turbulence simulations. We show 
that hydromagnetic turbulence in rapidly rotating protoneutron 
stars produces an inverse cascade of energy. We find a large-scale, 
ordered toroidal field that is consistent with the formation of bipolar 
magnetorotationally driven outflows. Our results demonstrate that 
rapidly rotating massive stars are plausible progenitors for both type 
Ic-bl supernovae!**!”? and long ~\-ray bursts, and provide a viable 
mechanism for the formation of magnetars”*>”*. Moreover, our 
findings suggest that rapidly rotating massive stars might lie behind 
potentially magnetar-powered superluminous supernovae””®. 

We study magnetohydrodynamic (MHD) turbulence in the shear 
layer around a rapidly rotating protoneutron star by using high- 
resolution, global, three-dimensional, general-relativistic (GR) MHD 
simulations (the resolution is about ten times higher than that of pre- 
vious simulations). We take our initial conditions from a full three- 
dimensional GRMHD adaptive-mesh-refinement simulation’ of stellar 
collapse in a rapidly spinning progenitor star. The initial spin period of 
the iron core, Po, is 2.25s before collapse; the spin period of the proto- 
neutron star after core bounce (when the collapsing core rebounds, 
launching the initial shock wave), Ppys, is 1.18 ms; and the initial max- 
imum magnetic field is 10'°G. We map to a high-resolution domain 
at time tmap = 20 ms after core bounce. At this time, flux compression 
and linear winding” has built up a maximum toroidal field of about 
7 x 10!* G close to the rotation axis of the protoneutron star, and about 
3 x 10'*G in the equatorial region. The maximum poloidal magnetic 
field is about 7 x 10!*G at tmap = 20 ms after core bounce. We carry out 
simulations at four resolutions, dx = (500m, 200m, 100m, 50m]; adopt 
a domain size of 66.5 km in the x and y directions and 133 km in the z 
direction (rotation axis); and use a 90° rotational symmetry in the x-y 
plane (with no symmetry in the z plane). This allows us to study the 
shear layer surrounding the core of the protoneutron star with unprec- 
edented resolution, using fully self-consistent global three-dimensional 
simulations of MHD turbulence in stellar collapse. 


The two lowest-resolution simulations show no or only minor 
amplification of the toroidal magnetic field, consistent with not 
resolving the fastest-growing mode (FGM) of the magnetorotational 
instability (MRI). The toroidal field in the two highest-resolution sim- 
ulations exhibits exponential growth soon after the start of the simula- 
tions (Fig. 1). The poloidal magnetic field evolution follows the toroidal 
one closely (Extended Data Fig. 1). The initial transition to exponen- 
tial growth in the global maximum toroidal field (Fig. 1a), and in the 
maximum toroidal field in a box with height 7.5 km above and below 
the equatorial plane (Fig. 1b), is nearly identical between the 100-m and 
the 50-m simulations. This indicates that we can resolve the FGM of the 
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Figure 1 | Evolution of the maximum toroidal magnetic field. Both 
panels show the maximum toroidal magnetic field (B”) as a function of 
time for the four resolutions 500 m, 200 m, 100 m and 50m. a, The global 
maximum field; b, the maximum field in a thin layer above and below 
the equatorial plane (—7.5km < z<7.5km). The magenta line indicates 
exponential growth with an exponential-folding time of T= 0.5 ms. 
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Figure 2 | Radial magnetic field strength. a—d, Visualization of the radial 
component of the magnetic field (B’) in two-dimensional r-z slices at 
azimuth = 45°, for the four resolutions 500 m (a), 200 m (b), 100 m (c) and 
50m (d), at t— tmap = 7-6 ms. The colour map ranges from positive 10'°G 
(yellow) to negative 10!°G (light blue). 


MRI with the 100-m simulation, and is consistent with our background 
flow stability analysis of the initial adaptive-mesh-refinement simula- 
tion (Extended Data Fig. 2). The observed growth time of 7~0.5 ms 
agrees well with the analytically predicted growth time of the FGM 
from linear analysis. The field evolution quickly becomes nonlinear, 
and this rapid growth reaches a fully turbulent saturated state within 
3 ms. The turbulent saturated toroidal field strength agrees to within a 
factor of two between the two highest-resolution simulations (100 m 
and 50m). Once nonlinear field strength is reached, secondary modes 
and couplings between individual modes become important for the 
observed growth time of the MRI. The final turbulent saturation field 
is not converged and differs between resolutions, because secondary 
instabilities, resistivity, and finite resolution effects become impor- 
tant”*??. However, these differences decrease with increasing resolu- 
tion and we expect our results to hold when even higher-resolution 
simulations become computationally accessible. This expectation is 
supported by the fact that the local features of our global three-di- 
mensional simulations are consistent with previous higher-resolution 
(dx = 10m) local simulations of the MRI”. 

The resolution dependence of the magnetic field in the turbulent 
state is striking (Fig. 2). Although the 500-m and 200-m simulations 
show none to only mild turbulence, the 100-m and 50-m simulations 
develop a fully turbulent shear layer around the protoneutron star. We 
observe radial filaments of magnetic field that oscillate from negative to 
positive values on a length scale of 1 km, consistent with the predicted 
wavelength of the FGM of the MRI (Extended Data Fig. 2). These 
structures resemble the formation of channel flows that are observed 
in shearing-box simulations’’, but do not stay coherent because of 
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Figure 3 | Turbulent kinetic and electromagnetic energy spectra. 

a, b, Energy as a function of the dimensionless wavenumber k (E(k)). 
Panel a compares the electromagnetic energy (Emag) across all 

four resolutions at t — tmap = 10 ms. Panel b shows a time series of 
electromagnetic energy spectra for the 50-m simulation only. Both panels 
show the turbulent kinetic energy (Ekin) as computed from the 50-m 
simulation (black solid line); a line indicating Kolmogorov scaling (k~*/) 
(purple dashed line); and the initial electromagnetic energy spectrum 
(black dashed line). c, The electromagnetic energy at a given wavenumber 
(Ek,mag(t)) versus time, and an exponential fit (purple solid line) and linear 
fit (purple dashed line). 


the background flow. Similar non-coherent filaments have also been 
observed in two-dimensional global simulations”. 

The turbulent kinetic and electromagnetic energy spectra calcu- 
lated from our simulations are shown in Fig. 3. Initially, the turbulent 
kinetic energy, which is nearly constant in time, is several orders of 
magnitude larger across all scales than the electromagnetic energy. 
The electromagnetic energy is highly time and resolution dependent. 
Although the low-resolution calculation shows little evolution away 
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Figure 4 | Three-dimensional volume renderings of the toroidal 
magnetic field, BY. All panels show ray-casting volume renderings of 
BY, The rotation axis z is the vertical axis, and the volume renderings are 
generated using a varying-alpha colour map. Yellow indicates a positive 
field of strength 10!°G; red denotes a weaker positive field; light blue 


from the initial spectrum, the higher-resolution calculations saturate 
at larger and larger energies at large values of the dimensionless wav- 
enumber k (Fig. 3a). The saturation value at large and intermediate 
k values is within a factor of three of equipartition with the turbulent 
kinetic energy in the 50-m calculation. After saturation is reached 
at large k values, we observe an inverse cascade of energy, which 
triggers the growth of large-scale electromagnetic energy peaking at 
k=4, corresponding to a length scale of 5 km for our domain. This 
is well below the driving scale of the FGM of the MRI (k= 20) and 
consistent with the structures evident in Figs 2d and 4c. The growth 
in the first 7 ms is fitted well by an exponential with exponential- 
folding time 7 = 3.5 ms. We observe a transition away from clean 
exponential growth for t — tmap = 7 ms; this transition might be caused 
by the magnetic field becoming dynamically relevant, and/or by 
(numerical) resistivity becoming important for the magnetic field 
evolution®. Here, the growth at k=4 is described better bya linear fit. 
In an inverse cascade, the energy is expected to reach approximately 
the same relative saturation value (with respect to the driving tur- 
bulent kinetic energy) at all k values with sufficiently long evolution 
times®“. We find evidence for this in the range 10 <k <50, where 
the magnetic energy spectrum begins to evolve towards a similar 
power-law scaling as the turbulent kinetic energy. Assuming that 
this also holds at smaller k values, we can extrapolate the growth of 
magnetic energy on the basis of the linear fit (Fig. 3c). We expect 
to reach saturation electromagnetic energy at small k values within 
t — tmap ¥ 60 ms. The observed differences between the calculations 
for 100-m and 50-m resolution, in their saturation energies at large 
k values and in their inverse energy cascades, indicates that the tur- 
bulent state is not fully captured with the 100-m simulation and that 
the efficiency of the inverse cascade may still increase when going to 
even higher resolutions than 50m. 

Our results indicate that the electromagnetic energy will rival the 
turbulent kinetic energy and dominate the less efficient neutrino 
heating®*°. Therefore, MHD stresses are probably the dominant 
factor in reviving the stalled shock in rapidly rotating progenitors. 
Furthermore, we observe the formation of large-scale, structured toroi- 
dal magnetic field near the rotation axis of the protoneutron star in the 
later stages of the 50-m simulation (Fig. 4c and Extended Data Fig. 3d). 
This large-scale field is not present in the initial data (Fig. 4a), nor does 
it develop in the lower-resolution cases (Fig. 4b and Extended Data 
Fig. 3a—c). This magnetar-strength toroidal field close to the rotation 
axis is a strong indication that hoop stresses, which favour the forma- 
tion of MHD-powered outflows, are present along the poles”””. Velocity 
vectors along the rotation axis are pointing outwards towards the end 
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corresponds to a negative field of strength 10'° G; darker blue indicates 
a weaker negative field. a, The initial conditions for our simulations; 

b, the 500-m simulation at time t — tmap = 10 ms; c, the 50-m simulation 
at tf — tmap = 10 ms. 


of the 50-m simulation, indicating the successful formation of bipolar 
outflows. (Extended Data Fig. 4). 

Our findings have implications for stellar collapse in rapidly rotat- 
ing massive stars. The MRI is a weak-field instability (that is, its 
growth time, Tp does not depend on the strength of the magnetic 
field), and the observed rapid exponential-folding time of 7+ 0.5 ms 
is short enough that the scenario presented here is viable even for 
much weaker initial seed fields. In addition, the MRI has been shown 
to operate efficiently in purely toroidal, mixed poloidal/toroidal, 
and random magnetic-field configurations”. Hence, we expect our 
results to hold for arbitrary precollapse magnetic-field configurations. 
Moreover, low-order multipole m= 1 instabilities, shown to alter the 
explosion geometry of jet explosions in the full three-dimensional 
simulations of ref. 9, will start to become relevant only after a large- 
scale toroidal field of magnetar strength has been built up (the insta- 
bility criterion depends on having an ultrastrong toroidal field present 
in the first place’). This makes MHD-driven explosions a likely sce- 
nario in rapidly rotating progenitors independently of the initial mag- 
netization of the star, with the explosion geometry probably being of 
the double-lobe form shown in ref. 9. Finally, the large-scale build-up 
of magnetic field in the shear layer of the protoneutron star demon- 
strates that MRI-driven turbulence is a promising mechanism for 
forming pulsars and magnetars in rapidly rotating stellar collapse. 
This indicates that rapidly rotating massive stars can also account 
for potentially magnetar-powered superluminous supernovae*>”®. 


Online Content Methods, along with any additional Extended Data display items and 
Source Data, are available in the online version of the paper; references unique to 
these sections appear only in the online paper. 
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METHODS 


Initial conditions—stellar collapse simulation. We start by performing a dynam- 
ical space-time ideal MHD simulation with adaptive mesh refinement (AMR) 
of the 25M. (at zero-age-main-sequence) presupernova model E25 (ref. 31), 
with initial conditions for differential rotation as in ref. 9 (initial central angular 
velocity of the iron core is 2.8 rad s~}; xp =500km; z= 2,000km; M., mass of the 
Sun). This model could be considered as a type Ic-bl/hypernova and long ~\-ray 
burst progenitor”. At the onset of collapse, we set up a modified dipolar magnetic 
field structure from a vector potential A with components A,-= Ay=0 and A,= 
Bore)? + 7) rsind, where ris the radius, rp = 1,000 km (as in ref. 9) is a parameter 
controlling the fall-off of the magnetic field, and By = 10"°G sets the initial strength 
of the magnetic field. This progenitor seed field is not unreasonable for ~\-ray-burst 
supernova progenitor cores*”"”. With the grid set-up (nine levels of box-in-box 
AMR; finest resolution dx = 375 m) and methods being identical to those in refs 9 
and 33, we follow this simulation until tmap = 20 ms after core bounce. At this time, 
the initial supernova shockwave has stalled at a radius of about 130km. 

Extended Data Figs 5 and 6 show the radial profiles of important state variables 
(density, entropy, angular velocity and fast magnetosonic speed) of the simulation 
at the time of mapping. Both the protoneutron star and the postshock region have 
reached a quasi-equilibrium state, and the underlying space-time changes only very 
slowly and secularly, allowing us to carry out subsequent high-resolution GRMHD 
simulations that assume a fixed background space-time for about 10-20 ms. The 
resolution of the AMR box covering the shear layer of the protoneutron star in 
this initial simulation is dx = 750 m, but to resolve the FGM of the MRI for the 
chosen initial magnetic field of 10!°G, a linear resolution of at least dx = 100m 
is required**. This is why the common method of obtaining the field strength 
necessary to power a magnetorotational explosion (>10!°G) has been by flux 
compression (Bx p”?; amplification by a factor of about 10°; p is the density of 
the gas in the collapsing core) from unrealistically high seed fields (B > 10!*G 
precollapse)*!0536, 

Background flow stability analysis. A magnetized fluid is unstable to weak-field 
shearing modes in the presence of a negative angular velocity gradient that is 
not compensated for by compositional or entropy gradients of the fluid’. At the 
time of mapping of the initial AMR simulation to the high-resolution domain, 
the plasma in the shocked region around the protoneutron star is locally unsta- 
ble to weak-field shearing modes, as given by Curi= (way +r x d?/dr)/2 <0 
(refs 2, 34, 37). Here, Curry is the stability criterion of the MRI; wpy is the Brunt- 


2 
characterizes 


Vaisala frequency indicating convective stability/instability; e 
ie 
the rotational shear; and (2 is the angular velocity. We follow refs 11 and 37 and 


calculate the stability criterion Cyyg), as well as the wavelength (Aggm) and growth 
time (Tg) of the FGM of the MRI, in two-dimensional x-y and x-z slices through 
our three-dimensional domain. 

To better approximate the background flow in our three-dimensional AMR 

stellar collapse simulation, we average in space and time. We first carry out a 
spatial averaging step and calculate averaged versions of the state variables of 
our simulation (for example, the spatially averaged density p;) at every time 
step. For that, we choose a centred stencil that takes into account three points in 
each direction (this is the maximum number of points that we have available at 
AMR component boundaries). Because this is insufficient to get a large enough 
sample of points for the averaging procedure, we also calculate a moving time 
average of the form pay; = ap; + (1 —@) pay,i—1, Where i denotes the current time 
step and i—1 the previous one. We choose a weight function for each data set in 
the moving average as a = 2(n At/Atcoarse + 1)~', where At is the time step on 
the current refinement level, and Afgoarse is the time step of the coarsest level. 
This choice of weight function guarantees that 86% of the data in the average 
comprise the last n time-step data sets. The time-step size in our AMR simulation 
on the refinement level that contains the shear layer around the protoneutron 
star is At=5 x 10-4ms, and we choose n such that a = 2,000, ensuring temporal 
averaging over a timescale of about 1 ms. We calculate Cri, Argo and Tego from 
the space and time averages of the state variables in our simulation (Extended 
Data Fig. 2). 
Mapping to a high-resolution computational domain. Next, we map the 
configuration to a three-dimensional domain with uniform spacing of the form 
x, y, Z=[—66.5km, 66.5 km] for four resolutions, h = [500 m, 200m, 100m, 50m]. 
To guarantee divergence-free initial data for the magnetic field, we carry out a 
constraint projection step after we have interpolated the magnetic field to the new 
domain. This is technically challenging as we have to make sure that all operators 
used in the projection are consistent in their definition with the discrete form 
of the divergence operator maintained in our specific implementation 
of constrained transport*’. We use a discrete analogue of the Helmholtz 
decomposition® to decompose the magnetic field into a discrete curl, V;, x, and 
a discrete gradient, V), : 


B=V,x A+ WViP (1) 


where @ is a discrete scalar field. The discrete divergence, V},*, of equation (1) 
leads to a discrete Poisson equation: 


Vi B= A, (2) 


where Ay, is the discrete Laplace operator. We solve equation (2) augmented with 
homogeneous Dirichlet boundary conditions to machine precision for ® using 
the conjugate gradient solver provided by the PETSc? library in combination 
with the parallel algebraic multigrid preconditioner HYPRE*”. We then obtain a 
divergence-free field, B’, from the projection B’ = B— V,®. Finally, we recompute 
Vi, ° B’ to check that it is zero to floating-point precision. 

High-resolution turbulence simulations. We perform ideal, fixed background 
space-time, GRMHD simulations using the open-source Einstein Toolkit?*"* 
with WENOS reconstruction*”°, the HLLE Riemann solver*! and constrained 
transport for maintaining V - B=0. We use the Ky = 220 MeV variant of the 
finite-temperature nuclear equation of state of ref. 43, and the neutrino leakage/ 
heating approximations described in refs 44, 45, with a heating scale factor 
Feat = 1.0. We perform simulations on a domain with uniform spacing of the form 
x, y= [0km, 66.5km] and z= [—66.5km, 66.5 km] for four resolutions, h = [500 m, 
200m, 100m, 50m], in quadrant symmetry three dimensions (90° rotational sym- 
metry in the x-y plane). We keep all variables at the boundary fixed in time. This is 
justifiable for several reasons. First, the accretion boundary flow itself only changes 
on timescales longer than those simulated. Second, the fast magnetosonic speed 
(Extended Data Figs 5d and 6c) is of the order of a few per cent of the speed of light 
throughout the high-resolution computational domain. This implies a boundary 
crossing time for the simulation box of about 20 ms. This leaves the results in 
the shear layer unaltered by boundary effects for the simulated times of 10 ms. 
Additionally, as the cylindrically rotating flow in the shocked region is rotating in 
and out of the purely Cartesian boundary zones, sound waves can be reflected at 
the boundaries. Although these reflections are not necessarily unrealistic, as there 
will be perturbations in the shocked region of any rotating iron core, they pose an 
additional complication for the numerical stability of the simulations*®. We find 
these reflections to be minimal in the hydrodynamical variables themselves, but 
they do cause spurious oscillations in the magnetic field towards the boundary 
zones. To prevent these oscillations at the outer boundary, without affecting the 
solution in the shear layer around the protoneutron star, we apply diffusivity at 
the level of the induction equation for the magnetic field via a modified Ohm's 
law. We choose E= —v x B+ nJ, where J= V x Bis the three-current density; we 
set 7= (0.5 + 0.5tanh((r— raig)b~!)) with = 107, raigg=40km and b=3km. 
That is, we apply diffusivity only in a region outside of radius rai and transition 
smoothly over a blending zone with width b to no diffusivity inside rig. 
Turbulent kinetic and magnetic energy spectra. We compute spectra of the 
turbulent kinetic and magnetic energy as instantaneous snapshots using the 


3 
discrete Fourier transform i(k) = > u(x)ers| = anit =|) (ref. 47), where u 
x 
is a vector field, L is the extent of the computational box, and N the number of grid 
points in the computational box. The spectra shown in Fig. 3 are densitized to 
better reflect the overall energy contained in the turbulent kinetic motion and the 
magnetic field. We show the spectra of the non-densitized turbulent velocity 
in Extended Data Fig. 7a, and the non-densitized magnetic field in Extended 
Data Fig. 7b, and also window the data to account for the non-periodicity at the 
boundaries of our computational domain. For that, we use a mollifier of the form 


m(x) = exp] [1 —(|x-d|/d) rh and respectively for y and z. This effectively 


blends the data to zero over a stencil width d at the outer boundary. We choose 
d=3, but note that other choices yield similar results. These non-densitized and 
windowed spectra illustrate that the lack of an exponential turnoff at large k in the 
turbulent kinetic energy in Fig. 3 is due to the inclusion of the nearly discontinuous 
density fall-off at the edge of the protoneutron star core (at r= 12 km) in the cal- 
culation of the spectrum for Fig. 3 and the non-periodicity of our computational 
domain. The non-densitized and windowed turbulent kinetic energy spectrum in 
Extended Data Fig. 7 is compensated for k~*”° scaling (as expected according to 
Kolmogorov theory“*). We observe a slightly steeper scaling between k-*? and k-*. 
Within the first 3 ms, there is a rapid transition into a fully turbulent state at large 
k (Fig. 3b and Extended Data Fig. 7a). Afterwards, the turbulent kinetic energy 
decreases at large k and the spectrum gradually evolves towards a steeper fall-off. 
There is no increase in the turbulent kinetic energy at small values of k at late times. 
The magnetic energy, similarly to the turbulent kinetic energy, peaks at large k at 
t — tmap © 3 ms, which correlates well with the observed saturation of the maximum 
toroidal field shown in Fig. 1. Subsequently, the magnetic energy at small k grows 
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first exponentially and then linearly with time. This picture is consistent with 
energy being extracted from the turbulent kinetic motion at large k and being 
pumped into an inverse cascade that leads to growth of magnetic-field energy at 
small values of k. As the kinematic phase ends and transitions into saturation, 
magnetic fields and numerical resistivity become important for the evolution’. 
This may explain the transition to linear growth. We also observe a superposed 
2-ms modulation on top of the k=4 exponential growth that corresponds roughly 
to the Alfven crossing time across the shear layer (t4 shear ¥ 2 ms). 
Angle-averaged magnetic flux and poloidal current. We compute the two- 
dimensional angle-averaged (in y) magnetic flux and poloidal current to determine 
which magnetic-field structures are global in y (Extended Data Figs 3 and 4). The 
magnetic flux is computed as f _ wBdw and the current as J= V x B. The 
isocontours of the magnetic flux represent the poloidal field lines, while the poloi- 
dal current approximates the toroidal magnetic field. We find that the shear layer 
of the protoneutron star distorts the initial poloidal magnetic field of the iron core, 
but we find no emerging global poloidal field created from turbulence. The toroi- 
dal field (poloidal current), however, does show a global structure that roughly fills 
the width of the shear layer in the polar region of our simulation, supporting the 
idea that the toroidal magnetar-strength field in our simulations (see also Fig. 4) 
truly is global in y. 

Limitations of this study. The limitations of this study are finite resolution of 
the simulations (most visible in the not-fully-converged saturation magnetic 
field), and the sensitivity of the detailed turbulent state to the numerical methods. 
Also, the impact of the imposed 90° rotational symmetry has to be investigated. 
Ultimately, high-resolution simulations such as these have to be embedded back 
into a full-star simulation to determine the detailed shock revival and explosion 
geometry. 

Code availability. All computer code used here that is not already freely available, 
and the initial data, are available at http://stellarcollapse.org. 
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Extended Data Figure 1 | Evolution of the maximum poloidal magnetic 
field. Both panels show the maximum poloidal magnetic field, B?, as a 
function of time for the four resolutions: 500 m, 200 m, 100m and 50m. 

a, The global maximum field. b, The maximum field in a thin layer above 
and below the equatorial plane (—7.5 km <z<7.5km). The purple line 
indicates exponential growth with an exponential-folding time, Trgm, 

of 0.5 ms. 
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Extended Data Figure 2 | Background flow stability analysis. 
a, b, The stability criterion Cyyg; 20 ms after core bounce for the initial 
stellar collapse simulation. a, A two-dimensional x-y slice (z= 0) through 
the three-dimensional domain; b, an x-z slice (y=0). Yellow and red 
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indicate regions that are stable to shearing modes; dark blue and light blue 
indicate unstable regions. c, The wavelength, Apc, of the FGM of the 
MRL. d, The growth time of the FGM, Tr. Panels c and d are zoomed in 
on the shear layer around the protoneutron star. 
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Extended Data Figure 3 | Angle-averaged poloidal magnetic current 
and magnetic flux. All panels show r-z slices (cylindrical coordinates, 


and superposed contours of magnetic flux (black lines) at 
angle-averaged in ) of the poloidal magnetic current (P°!, colour-coded) 


t — tmap = 10.3 ms (final simulated time). a, The 500-m simulation; b, the 
200-m simulation; c, the 100-m simulation; d, the 50-m simulation. 
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Extended Data Figure 4 | Angle-averaged poloidal magnetic current and velocity vectors. The figure shows r-z slices (cylindrical coordinates, angle- 
averaged in ~) of the poloidal magnetic current ( JP°!, colour-coded) and superposed velocity vectors (red arrows) at f — tmap = 10.3 ms (final simulated time). 
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Extended Data Figure 5 | AMR stellar collapse simulation. All panels 
show profiles along the x direction of the initial stellar collapse simulation, 
20 ms after core bounce. a, Density (p); b, entropy (s), kg is the Boltzmann 
constant; c, angular velocity (Vang); d, fast magnetosonic speed (Vvims). 
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Extended Data Figure 6 | AMR stellar collapse simulation. All panels 
show profiles along the z direction of the initial stellar collapse simulation, 
20 ms after core bounce. a, Density; b, entropy; c, fast magnetosonic speed. 
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Extended Data Figure 7 | Non-densitized turbulent kinetic and 
electromagnetic energy spectra. a, A time series of non-densitized 
turbulent kinetic energy spectra, Exin(k), compensated for Kolmogorov 
scaling (k~*), as a function of the dimensionless wavenumber k. b, A time 
series of non-densitized magnetic energy spectra, Emag(k), as a function of 
the dimensionless wavenumber k. In both panels, the initial spectrum at 

t — tmap = 0 ms (dashed black line) is shown for reference. 
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Multi-element logic gates for trapped-ion qubits 


T.R. Tan, J. P Gaebler!, Y. Lin!, Y. Wan!, R. Bowler!+, D. Leibfried! & D. J. Wineland! 


Precision control over hybrid physical systems at the quantum 
level is important for the realization of many quantum-based 
technologies. In the field of quantum information processing (QIP) 
and quantum networking, various proposals discuss the possibility 
of hybrid architectures! where specific tasks are delegated to the 
most suitable subsystem. For example, in quantum networks, it may 
be advantageous to transfer information from a subsystem that has 
good memory properties to another subsystem that is more efficient 
at transporting information between nodes in the network. For 
trapped ions, a hybrid system formed of different species introduces 
extra degrees of freedom that can be exploited to expand and refine 
the control of the system. Ions of different elements have previously 
been used in QIP experiments for sympathetic cooling’, creation of 
entanglement through dissipation’, and quantum non-demolition 
measurement of one species with another*. Here we demonstrate 
an entangling quantum gate between ions of different elements 
which can serve as an important building block of QIP, quantum 
networking, precision spectroscopy, metrology, and quantum 
simulation. A geometric phase gate between a °Be* ion and a 
?5Mg* ion is realized through an effective spin-spin interaction 
generated by state-dependent forces induced with laser beams>?. 
Combined with single-qubit gates and same-species entangling 
gates, this mixed-element entangling gate provides a complete set 
of gates over such a hybrid system for universal QIP!*!”. Using a 
sequence of such gates, we demonstrate a CNOT (controlled-NOT) 
gate and a SWAP gate!’. We further demonstrate the robustness of 
these gates against thermal excitation and show improved detection 
in quantum logic spectroscopy'*. We also observe a strong violation 
of a CHSH (Clauser-Horne-Shimony-Holt)-type Bell inequality'® 
on entangled states composed of different ion species. 

Trapped ions of different elements vary in mass, internal atomic 
structure, and spectral properties, features that can make certain spe- 
cies suited for particular tasks such as storing quantum information, 
high-fidelity readout, fast logic gates, or interfacing between local pro- 
cessors and photon interconnects. One important advantage of a hybrid 
system incorporating trapped ions of different elements is the ability 
to manipulate and measure one type of qubit using laser beams with 
negligible effects on the other since the resonant transition wavelengths 
differ substantially. When scaling trapped-ion systems to greater num- 
bers and density of ions, it will be advantageous to perform fluorescence 
detection on individual qubits without inducing decoherence on neigh- 
bouring qubits due to uncontrolled photon scattering. To provide this 
function in a hybrid system one can use an entangling gate to transfer 
the qubit states to another ion species which is then detected without 
perturbing the qubits. This readout protocol could be further gener- 
alized to error correction schemes by extracting the error syndromes 
to the readout species while the computational qubits remain in the 
code. Another application could be in building photon interconnects 
between trapped-ion devices. Here, one species may be better suited for 
memory while the other is more favourable for coupling to photons!®"”. 

A mixed-element gate can also improve the readout in quantum 
logic spectroscopy (QLS)'*. In conventional quantum logic readout, 


the state of the clock or qubit ion is transferred to a motional state and 
in turn transferred to the detection ion, which is then detected with 
state-dependent fluorescence. In this case, the transfer fidelity directly 
depends on the purity of the motional state. In contrast, transfer using 
the gate discussed here can be insensitive to the motion, as long as 
the ions are in the Lamb-Dicke regime'®. This advantage extends to 
entanglement-assisted quantum non-demolition (QND) readout of 
qubit or clock ions, which can lower the overhead in time and number 
of readout ions as the number of clock ions increases’. 

In our experiment, we use a beryllium (*Be*) ion and a magnesium 
(*°Mg*) ion separated by approximately 41m along the axis ofa linear 
Paul trap”. The addressing lasers for each ion (wavelength A~ 313 nm 
for Bet and A~ 280 nm for *°Mg*) illuminate both ions. The qubits 
are encoded in hyperfine states of the ions. We choose 
|F =2,mp=0) =||)pe and|1,1) =|7)p- as the Bet qubit states, and 
|2,0)=|1)mg and |3,1)=|T)mg for the *°Megt qubit. The Coulomb 
coupling between the ions gives rise to two shared motional normal 
modes along the trap axis. A magnetic field of 11.945 mT is applied at 
45° with respect to the trap axis. At this field, the *Be* qubit transition 
frequency is, to first order, insensitive to external magnetic field 
fluctuations”!. The magnetic field sensitivity of the *Mg* qubit is 
approximately 430 kHzmT~'. By measuring the decay of Ramsey 
interference fringes versus time between the Ramsey pulses on each 
qubit transition, we determine the °Be* qubit’s coherence time to be 
approximately 1.5 s. The *Mg* qubit coherence time is approximately 
6 ms, limited by magnetic field fluctuations. We verified that the phase 
and contrast of Ramsey experiments on one species do not change 
measurably in the presence of light addressing the other species. This 
shows that the spectral separation is sufficient to isolate the species. 

Entanglement between the two ions is achieved through a Molmer- 
Sorensen (MS) spin-spin interaction®* induced by laser-driven stim- 
ulated Raman transitions’*. Starting in the state|t);.| T)mg =|tT) the 
interaction can produce the Bell state 6. = sell) + |TT)) (see 
Methods). 

The laser beam configurations to induce coherent Raman transi- 
tions are analogous for each element; for brevity, we will only describe 
the configuration for *Be* (red in Fig. 1). Three laser beams, labelled 
by their wave vectors ky,co1, k1,co2, and ky,99, are derived from a sin- 
gle laser with wavelength \~ 313nm. Beams kj,co; and kj,co2 are 
co-propagating such that their wave vector differences with respect 
to the k; 9) beam are aligned along the trap axis. In this configuration, 
only the axial motional modes interact with the laser beams. The two 
co-propagating beams induce detuned blue and red sideband Raman 
transitions, respectively, when paired with the kj,9) beam to implement 
the MS interaction (see Methods). 

One important consideration in creating deterministic mixed- 
element entanglement with the MS interaction driven by multiple laser 
fields is the control over the relative optical phases at the ions’ loca- 
tions. The basis states |+);> |—) p and the state-dependent forces that 
are applied to them (see Methods) depend on the optical phases of the 
beams kj.co1, kj,co2, and kj.99 (j = 1, 2) at the ion positions. Beams kj.co1 
and kj,co2 are generated in the same acousto-optic modulator, one for 
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Figure 1 | Configuration of laser beams for the mixed-element 
entangling gate. For the °Be* ion, 313 nm laser beams (in red) 
simultaneously induce near-resonant red and blue sideband transitions. 
Similarly, for *Mgt, 280 nm beams (in green) induce sideband transitions. 
When all beams are applied simultaneously this implements the MS spin- 
spin interaction (see Methods). Each set of qubit-addressing laser beams 

is set up such that the wave vector differences Akj,, = kj99 — kj,co1 and 
Akjb= kj,00 — Kj,co2 (j = 1, 2) are aligned in the same direction along the 
trap axis such that only motional modes along this axis can be excited. 


each ion species, and travel nearly identical paths. However, the kj 
beams take a substantially different path to reach the ions’ locations. 
Temperature drift and acoustic noise cause changes in the different 
beam paths that lead to phase fluctuations in the MS interaction. These 
fluctuations are slow on the timescale of a single gate but substantial 
over the course of many experiments. To suppress these effects, we 
embed the MS interaction in a Ramsey sequence implemented with 
two 1/2 carrier pulses induced by kj,co1 (solid arrows) and kj,99 for each 
qubit”” (Methods). The first set of pulses maps the|) and ||) states 
of each qubit onto the|+); and|—); states, whose phases are synchro- 
nized with the MS interaction. The final set of pulses undoes this mapping 
such that the action of this sequence is independent of the path length 
differences as long as the differences are constant during the entire 
sequence. In this case, the sequence produces a phase gate G that 
implements |{1) —|11)s|1.1) alt L)>|L1) allt), and |L1) + [LL). 


Such a phase gate could also be implemented as in ref. 9 (on qubits 


Laser carrier 


oe 


Figure 2 | Pulse sequences for logic gates. a, Starting with the| |) pe|T )mg 
state, this pulse sequence generates a Bell state with G (blue-dashed box) 
and single-qubit microwave (j1wv) gates. The notation (0, d) represents 
the rotation angle and relative phase of each gate pulse. A parity oscillation 
is induced by applying analysis 7/2 pulses with a variable phase @ to 

the created Bell state. To demonstrate the phase insensitivity of G, the 
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with magnetic-field-sensitive transitions). This requires fewer laser 
beams but adds the technical difficulty of synchronizing the state- 
dependent forces at the ion locations for both species. 

Before applying the gate, the ions are first Doppler cooled in all three 
directions. The axial motional modes are further cooled to near the 
ground state by Raman sideband cooling on the *Be* ion”’. State ini- 
tialization into the qubits’ |{) states and qubit state readout are 
described in Methods. After each experiment repetition, we measure 
one of the possible states:|7),|T),|{T), or| 11). 

In a first experiment, we prepare the Bell state ; with the MS 
interaction (Fig. 1) and determine its fidelity by measuring the qubit 
populations and the contrast of the parity oscillation by applying ‘anal- 
ysis’ pulses”*. The analysis pulses are laser carrier transitions induced 
by the non-co-propagating laser beams kj,co1 and kj,99 such that the 
relative phase defining the basis states of MS interaction is stable with 
respect to that of the analysis pulses for each experiment repetition. 
We determine a Bell state fidelity of 0.979(1) (the number in paren- 
theses is the standard error of the mean). We also create a Bell state by 
applying microwave carrier 7/2 pulses on each qubit before and after 
the operation G (red-dashed box in Fig. 2a) achieving a fidelity of 
0.964(1). Following the procedure of ref. 25, we perform a CHSH-type 
Bell-inequality test’ on this state, achieving a sum of correlations of 
B=2.70(2) > 2. This inequality, measured on an entangled system 
consisting of different elements, agrees with the predictions of quan- 
tum mechanics while eliminating the detection loophole but not the 
locality loophole”. 

The imperfections of the entangled states can be attributed to mul- 
tiple causes, which we investigate through calibration measurements 
and numerical simulation. We estimate the error from imperfect state 
preparation and detection to be 5 x 107° (see Methods). Other errors 
are spontaneous photon scattering” of *>Mg* (6 x 107%) and °Be* 
(1 x 10%), and heating of the motional mode due to electric field 
noise (4 x 10~°) (ref. 27). Other known error sources include imper- 
fect single-qubit pulses, off-resonant coupling to spectator hyperfine 
states and the other motional modes, mode frequency fluctuations, 
qubit decoherence due to magnetic field fluctuations, laser intensity 
fluctuations, optical phase fluctuations, and calibration errors. Each 
of these sources contributes error of the order of 107? or less. We find 
close agreement between the experimental data and numerical simu- 
lations that include the listed imperfections. 

We use G to construct a CNOT gate by applying microwave 7/2 
pulses on one of the qubits before and after G (green-dashed boxes in 
Fig. 2b) and use it to demonstrate qubit state mapping. The ‘target’ of 


+ wv analysis ¢ 


oy FOZ 


single-qubit gates and the analysis pulses are implemented by microwave 
fields that are not phase synchronized to the optical phases. b, Pulse sequence 
of a Ramsey experiment where a superposition state of a °Be* qubit is 
coherently transferred to a **Mg* qubit with a SWAP gate (black-dashed 
box). Given G, either of the two qubits can be the target qubit of a CNOT 
gate (green-dashed boxes) by applying single-qubit 1/2 pulses to it. 
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Figure 3 | Robustness of quantum logic readout against thermal 
excitation. Shown is Rabi flopping of the "Bet ion detected on the *Mgt 
ion with the motional modes cooled to Doppler temperatures using the 
two mapping procedures described in the text. P(|})mg) is the probability 
of finding the **Mg* qubit in the|{) state versus the duration of the carrier 
pulse on the °Bet qubit. The CNOT mapping technique, which makes use 
of the mixed-species gate described here, performs better than the 
conventional QLS procedure due to the relative insensitivity to motional 
excitation. Each data point represents 200 repetitions; error bars, s.e.m. 


the CNOT gate is the qubit to which the single-qubit pulses are applied. 
The CNOT gate inherits the robustness against motional excitation 
from the MS gate>-®. We compare the results obtained using the CNOT 
gate with the method used in the conventional QLS procedure where 
a red-sideband 7 pulse is first applied to the *Be* ion followed by a 
red-sideband rt pulse to the *Mg* ion'*. Both procedures are calibrated 
for the motional mode ground state. Figure 3 shows Rabi flopping of 
the °Be* qubit as detected on the *Mg* ion, which is initially prepared 
in the|{) state. For the ions’ motional modes cooled to Doppler tem- 
perature (mean occupation number 7 ~ 4), the contrast of the conven- 
tional QLS method (red dots) is reduced compared to transfer with the 
CNOT gate (blue squares). In both of these mapping procedures the 
*Be* qubit phase information is not accessible on the **Mg* ion. To 
preserve this phase information, we construct a SWAP gate that inter- 
changes the quantum state of the two qubits'? with three CNOT gates. 
Figure 2b shows the pulse sequence of a Ramsey-type experiment 
where the first Ramsey (microwave) 7/2 pulse is applied to the "Bet 
ion and the second (microwave) 7/2 pulse is applied to the Mgt ion 
after implementing the SWAP gate. Ramsey fringes for the ions’ axial 
motional modes initialized to near the ground state (7 + 0.05, blue 
squares) and Doppler cooled (f# ~ 4, red dots) are shown in Fig. 4. The 
contrast at Doppler temperature is reduced because the Lamb-Dicke 
limit is not rigorously satisfied. Through simulation with and without 
the measured SMe qubit decoherence, we determine that the loss of 
contrast for the SWAP gate due to this decoherence is approximately 
2%. For all three methods, the contrast could be somewhat improved 
by calibrating all gates for the given motional temperature. 

We have demonstrated a mixed-element entangling gate where we 
employ a Ramsey sequence to suppress loss of fidelity of the output 
state due to low-frequency optical path length fluctuations’. Using 
this gate, we implement CNOT and SWAP operations between qubit 
elements which are relatively robust against thermal excitation of 
the motion. These and related techniques are potentially useful 
for building a large scale processor or quantum network using the 
advantageous properties of different ion species!®”*. The entangling 
technique should also be applicable to qubits with optical transitions 
(for example, Cat or Sr*), or a combination of hyperfine qubits and 
optical qubits, which can also make this technique useful for readout 
in quantum logic clocks”. 

Similar work has also been carried out at the University of 
Oxford*? on different isotopes of Cat where the same laser beams 
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Figure 4 | Ramsey experiments with SWAP gate. Shown are the Ramsey 
fringes of the >Mg* qubit after initializing the °Be* ion in the 

al +) +|{)) state and applying the SWAP gate. P(|{)mg) is the 
probability of finding the *Mg* qubit in the|) state and ¢ is the relative 
phase of the 7/2 pulse applied to the **Mg* qubit. The solid lines are fitted 
curves with contrast of 94% for the ions initialized to the ground state 
(mean occupation number fi © 0.05) and 61% for the ions initialized to the 
Doppler cooling temperature (7 4). The phase offset depends on the 
calibration of the SWAP gate and can be experimentally adjusted to any 
value. Each data point represents 200 repetitions; error bars, s.e.m. 


can manipulate both isotopes simultaneously. The method presented 
here uses two substantially different sets of laser beams with different 
wavelengths, illustrating that cross-talk between operations on differ- 
ent species can be negligible, and could be applied to take advantage 
of the desirable features of each species. 


Online Content Methods, along with any additional Extended Data display items and 
Source Data, are available in the online version of the paper; references unique to 
these sections appear only in the online paper. 
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METHODS 


Geometric phase gate. The Molmer-Sorensen (MS) protocol>** requires simul- 
taneous excitation of a blue sideband transition with a detuning of 6 and a red 
sideband transition with a detuning of —é for a selected motional mode (Fig. 1). 
The excitation creates a forced harmonic oscillator interaction that displaces the 
motional wavefunction in phase space in a manner that is dependent on the 
internal qubit states. If the different displacements enclose a loop, the qubit states 
pick up a geometric phase proportional to the state-dependent area of the enclosed 
loop. We create an entangling logic gate by choosing appropriate geometric phase 
differences between different qubit states. 

Laser fields are used to induce coherent stimulated-Raman transitions between 
the qubit states of each ion and the shared quantized degrees of motion!®. For each 
qubit we can excite carrier transitions| |, n) < |], n) that induce spin-flips without 
changing the motional Fock state n. A blue (red) sideband excitation flips the spin 
while adding (removing) a quantum of motion by detuning the fields from the 
carrier transition frequency by the motional frequency. The relative frequencies, 
phases, and intensities of each set of laser beams (Fig. 1) for each qubit can be 
adjusted with acousto-optic modulators (AOMs), which are computer controlled. 
The °Bet Raman laser beams with a wavelength of \~ 313 nm are approximately 
480 GHz red detuned from the Sj/2 to Pj/2 electronic state transition. The 
A 280nm Raman laser beams for 7°Mg" ion are approximately 160 GHz blue 
detuned from the S}/2 to P32 electronic state transition. Carrier transitions can also 
be implemented by microwave fields delivered from an antenna located outside 
the vacuum chamber. 

After transforming into the respective interaction frames of both qubits as well 
as that of the shared motional mode of motion, and dropping high-frequency 
terms in the rotating-wave approximation, we can write the interaction in the 
Lamb-Dicke limit as! 


H=h Yo 284 (ae-t6#-910) 4 atelldt+10)) + he. 
j=l2 


where j = 1,2 denotes the two different ion species, (2; = nj; where 2, is the 
carrier resonance Rabi frequency. The Lamb-Dicke parameter 17; is equal to 
AkjZo,jbj, where b; is the mode amplitude of the jth ion and Zo, = ,//2mjwz, m; 
is the mass and w, is the frequency of the selected normal mode. The spin raising 
operator is at and a’ is the creation operator for the relevant (harmonic) motional 
mode. 

The phases of the red (r) and blue (b) sideband interactions are $j.) = 
AKjx)X0,j +A jr) where Ak;j.(p) and A @jr(p) are the differences in wave vectors 
and phases of the optical fields driving the red and blue sideband transitions, 
respectively, and Xo, is the equilibrium position for the jth ion. After setting (2; = 
(= Qand 6, = 6,= 6, and writing dy,j= (jr — Gjb)/2, the geometric phases accu- 
mulated after a duration of fyjs= 27/6 for the four |+-); and |—); basis states (defined 


as the eigenstates of 64, ; = cos((¢;, + $5) /2)6., — sin((¢), + $y)/2)8),) are 


_ 8x2? 2 ua m2 
Pl+t)|—-) = ga 098 2 


_ 8x? 4 ou — m2 
Plts—)l-s+) > 53 SP 2 


(1) 


To maximize entangling gate speed, the geometric phases for the different parity 
qubit states in equation (1) are set to differ by 1/2. This is accomplished by adjust- 
ing the phases of the radio frequencies driving the AOMs. 

There are two axial modes: the lower-frequency mode (w,= 2 x 2.5 MHz), 
where the ions oscillate in phase, and the higher-frequency mode (27 x 5.4MHz), 
where the ions oscillate out of phase. The Lamb-Dicke parameters for the °Be* 


(°Mg*) ion are 0.156 (0.265) and 0.269 (0.072), respectively, for the two modes. 
We use the in-phase mode for our demonstration because the 7>Mg* ion has a 
larger normal mode amplitude compared to the out-of-phase mode. This results 
in less spontaneous emission error for a given strength of the state-dependent 
force. Gate time fs is approximately 351s. . 
Calibration procedure for phase gate G. To produce the phase gate G, the phases 
of the 1/2 pulses for the Ramsey sequence must be referenced to the basis states 
of the MS interaction defined by the optical phases. The phases must also account 
for the AC Stark shifts induced by the laser beams that are used for the MS 
interaction. 

To calibrate these phases, we first perform the pulse sequence shown in the 
blue-dashed box of Fig. 2a with the MS interaction pulses detuned far off- 
resonant from the red and blue sideband transitions such that they only induce 
AC Stark shifts on the qubits. Starting with the input state|{7), we set the phases 
of the final 1/2 laser pulses such that the action of this pulse sequence returns 
each qubit to the|{) state. Then, we perform this sequence with the MS inter- 
actions correctly tuned and vary the phases of the MS interactions. Again, in this 
case we look for the phase that maps the input state ||) back to itself. We verify 
the action of this G operation by creating a Bell state with the pulse sequence 
shown in Fig. 2a. 

Qubit state preparation and readout. For qubit state preparation, the °Be* ion 
is optically pumped to the |2, 2) state followed by Doppler cooling implemented 
by driving the S; /2|2, 2)  P3/2|3, 3) cycling transition with o* polarized light. 
Similarly, we optically pump the **Mg* ion to the |3, 3) state and apply Doppler 
cooling on the S; /2|3, 3) <> P3/2|4, 4) transition. For ground state initialization of 
the axial motional modes, Raman sideband cooling is applied to the Bet ion”’. 
To transfer the Be™ |2, 2) state to the|1, 1) =|) pe state, we use microwave com- 


posite pulse sequences that are robust against transition detuning errors. These 
consist of resonant > 0); (= > ; 
the angle the state is rotated about a vector in the x-y plane of the Bloch sphere 


(3 0) pulses*!, where the first entry denotes 


and the second angle represents the azimuthal angle for the rotation axis. With 
analogous sequences, we first transfer the **Mg* from the |3, 3) state to the |2, 2) 
state, and then to the|3, 1) = |})mg state. 

State-dependent resonance-fluorescence detection is accomplished with an 
achromatic lens system designed for 313 nm and 280 nm (ref. 32). We sequentially 
image each ion’s fluorescence onto a photomultiplier tube. After reversing the 
initial mapping procedures to put the|{) states back in the respective cycling tran- 
sition ground states, we apply the Doppler cooling beams. The fluorescing or 
‘bright’ state of this protocol therefore corresponds to the| |) state of each ion. The 
| |) state of each qubit is transferred to |1, -1) and |2, —2) for the °Bet and*°Mg", 
respectively, with microwave carrier 7 pulses. These states are ‘dark’ to the detec- 
tion beams and correspond to the| |) state. This ‘shelving’ technique is used to 
minimize the overlap of the bright and dark state photon count probability distri- 
butions. With detection durations of 3301s for °Be* and 2001s for *Mg*, we 
detect on average 30 photons for each ion when they are in the bright state and 
3.5 photons (predominantly from background light) when they are in the dark 
state. The qubit state is determined by choosing a photon count threshold such 
that the states are maximally distinguished. The state preparation and detection 
error of 5 x 10-3 reported in the main text includes errors due to the threshold 
detection protocol (false determination of each detected state being in the other 
state) and the infidelities of the microwave transfer pulses. 

Sample size. No statistical methods were used to predetermine sample size. 


31. Levitt, M. H. Composite pulses. Prog. Nucl. Magn. Reson. Spectrosc. 18, 61-122 
(1986). 

32. Huang, P. & Leibfried, D. Achromatic catadioptric microscope objective in deep 
ultraviolet with long working distance. Proc. SPIE 5524, 125-133 (2004). 
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Hybrid quantum logic and a test of Bell’s inequality 
using two different atomic isotopes 


C. J. Ballance’, V. M. Schafer!, J. P. Homel, D. J. Szwer'!, S. C. Webster!, D. T. C. Allcock!, N. M. Linkel, T. P. Harty!, 


D. PL. Aude Craik!, D. N. Stacey!, A. M. Steane! & D. M. Lucas! 


Entanglement is one of the most fundamental properties of quantum 
mechanics’, and is the key resource for quantum information 
processing*° (QIP). Bipartite entangled states of identical particles 
have been generated and studied in several experiments, and post- 
selected or heralded entangled states involving pairs of photons, 
single photons and single atoms, or different nuclei in the solid 
state, have also been produced® ”. Here we use a deterministic 
quantum logic gate to generate a ‘hybrid’ entangled state of two 
trapped-ion qubits held in different isotopes of calcium, perform 
full tomography of the state produced, and make a test of Bell’s 
inequality with non-identical atoms. We use a laser-driven two- 
qubit gate!>, whose mechanism is insensitive to the qubits’ energy 
splittings, to produce a maximally entangled state of one “Ca* qubit 
and one “*Ca* qubit, held 3.5 micrometres apart in the same ion 
trap, with 99.8 + 0.6 per cent fidelity. We test the CHSH (Clauser- 
Horne-Shimony-Holt)" version of Bell’s inequality for this novel 
entangled state and find that it is violated by 15 standard deviations; 
in this test, we close the detection loophole® but not the locality 
loophole’. Mixed-species quantum logic is a powerful technique 
for the construction of a quantum computer based on trapped 
ions, as it allows protection of memory qubits while other qubits 
undergo logic operations or are used as photonic interfaces to other 
processing units!>!°, The entangling gate mechanism used here can 
also be applied to qubits stored in different atomic elements; this 
would allow both memory and logic gate errors caused by photon 
scattering to be reduced below the levels required for fault-tolerant 
quantum error correction, which is an essential prerequisite for 
general-purpose quantum computing. 

For Schrédinger, entanglement was “the characteristic trait of quan- 
tum mechanics”! and it has been at the heart of debates about the 
foundations of quantum mechanics since the framing of the Einstein- 
Podolsky—Rosen paradox’. The theoretical work of Bell’, and of Clauser 
et al.'*, established an experimental test which could be used to rule 
out local hidden-variable theories on the basis of correlations between 
measured properties of entangled particles, and numerous experi- 
ments, starting with that of Freedman and Clauser, have confirmed 
the predictions of quantum mechanics®"®. Tests of Bell’s inequality 
with trapped ions were the first to close the so-called ‘detection loop- 
hole’; hitherto these trapped-ion tests had been exclusively carried out 
with identical atoms*!”'8, The entanglement explored in tests of Bell’s 
inequality is typically an entanglement between distinguishable parti- 
cles, in the strict quantum mechanical sense, but when the particles are 
identical in their internal structure and state, they are distinguishable 
only through their spatial localization. By employing different isotopes, 
our experiments involve entities that are also distinguishable by many 
internal properties, such as baryon number, mass, spin and resonant 
frequencies. 

Apart from its intrinsic interest, entanglement is a central resource 
for quantum information applications, such as quantum cryptography 
and quantum computing*. Trapped atomic ions are one of the most 


promising technologies for the implementation of quantum compu- 
tation; several demonstrations of simple multi-qubit algorithms have 
been made”’ and the elementary set of quantum logic operations 
has recently been demonstrated with the precision required for the 
implementation of fault-tolerant techniques”°". Scaling up trapped- 
ion systems to the large numbers of qubits required for useful QIP and 
quantum simulation will almost certainly require the use of more than 
one species of ion, both for the purpose of sympathetic laser-cool- 
ing (which allows independent control of the external and internal 
atomic degrees of freedom)!*?”” and for providing robust memory 
qubits. The best memory qubits reside in hyperfine ground states”°”*, 
which have essentially infinite lifetimes against spontaneous decay, 
but are vulnerable to the scattering of a single photon of resonant 
laser light. In a complex, multi-zone, ion trap processor it will be dif- 
ficult to shield the memory qubits sufficiently well from resonant laser 
beams, hence it will be useful to employ different species of ion—for 
example, as memory and logic qubits—and a high-fidelity entangling 
gate operation between the two species will be invaluable. A significant 
initial step was the demonstration of coherent state transfer between 
different species in the context of precision metrology*”®. The rel- 
ative merits of using different isotopes versus different elements are 
discussed below. 

In the present work, we entangle qubits stored in two different iso- 
topes of calcium. The *°Ca* qubit is stored in the Zeeman-split ground 
level, (|1),|1)) = (487 ? ast *), and the **Ca* qubit is stored in the 
hyperfine ground states (|JJ), |{f)) = (4si3%, 48715"), see Fig. 1. The 
qubit energy splittings differ by some three orders of magnitude 
(f, © 5.4MHz, fi - 3.2 GHz), but they may nevertheless be efficiently 
coupled via the two-qubit gate mechanism of ref. 13, in which the 
‘travelling standing wave’ from a pair of far-detuned laser beams exerts 
a qubit-state-dependent force on the ions whose magnitude F is largely 
independent of the qubit frequency. The force originates from a spa- 
tially varying light shift, oscillates at the difference frequency 6 between 
the two beams and, when 6=f,+ bg is set close to the resonant fre- 
quency f, of a normal mode of motion of the two-ion crystal, a two- 
qubit phase gate may be implemented by applying the force for a time 
of 1/6,. An advantage of this type of gate is that the phase of the optical 
field does not need to be referenced to either of the qubit phases (see 
Methods); this makes scaling the system easier because the relative 
optical phase does not need to be controlled between different trap 
zones, or during time delays between gates. 

An important difference in the gate mechanism compared with the 
case of identical ions is that the forces on corresponding qubit states 
differ (F;# F, and F, # Fy) so that, in general, the four possible qubit 
states (T 7}; TU. If, |.) each acquire different phases. We choose to 
implement the gate operation in two halves, each of duration tol 2=1/ bg, 
separated by spin-flip operations (1 pulses) on the qubits (Fig. 2a). This 
symmetrizes the gate operation G such that the relative phases acquired 
by the four states are (0, &, &, 0). By setting the laser power (that is, the 
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Figure 1 | Calcium ion energy levels and experimental geometry. 

a, Qubit states and Raman transitions in Ca‘ (purple, left) and *°Ca* 
(violet, right). The two Raman beams (6* and GF, blue, centre) have a 
frequency of ~f,;, a mean detuning of A = —1.04THz from the 

4S 1/2 4P1/2 (397 nm) transition, and a difference frequency of 

6=f,+ 6, 2.0 MHz. b, Raman gate beam geometry. The two 
perpendicular beams are aligned to set the lattice k vector parallel to the 
trap axis Z. The beams have waist radii w= 27 um, a power of ~5 mW each, 
and orthogonal linear polarizations as indicated. A third, n-polarized, 
Raman beam (not shown) co-propagates with the 7 beam and is used 
for sub-Doppler sideband cooling and single-qubit operations on “Cat. 
The quantization axis is set by a magnetic field B+ 0.2 mT. The diagram is 
not to scale: the ions are separated by 3.5 1m, which is 12.5 periods of the 
standing wave, and around 20,000 times the atomic radius of calcium. 


effective Rabi frequency) and gate detuning 6, appropriately, such that 
@=7/2, and enclosing the gate operation in a Ramsey interferometer 
(two pairs of 7/2 pulses), we can generate the maximally entangled Bell 
state (|{U) +|Tft))/-/2 from the initial state|| J). The 7 pulses also 
protect the qubits against dephasing due to slow (>f,) variations in 
magnetic fields. 

In our experiment, we implement the gate using the in-phase axial 
motional mode (at f,= 2.00 MHz) of a linear Paul trap”’, with the ion 
separation (3.5m) equal to a half-integer number of standing wave- 
lengths, thus exciting the motion maximally for the|T ||) and | | }) states. 
The Lamb-Dicke parameters for the two different isotopes are 
N49 = 0.121 and 743= 0.126. After initial Doppler cooling, both axial 
modes are cooled close to their ground states (mean occupation 
number @ < 0.1) by Raman sideband cooling applied to the “°Cat ion, 
which sympathetically cools the **Ca* ion*®. Both qubits are initialized 
by optical pumping, after which we apply the gate sequence shown 
in Fig. 2a, using a gate duration ts= 27.4 1s. Single-qubit 7/2 and 
1X pulses, for the spin-echo and tomography operations, are applied 
using co-propagating Raman beams (for ““Ca*) and microwaves (for 
8Ca*). The ordering of the ion pair in the trap was kept constant over 
the time taken to acquire the full data set, to guard against systematic 
effects associated with ion position (see Methods). We implement indi- 
vidual single-shot qubit readout by state-selectively shelving both ions 
to the 3Ds,2 level simultaneously, then detecting the ions’ fluorescence 
sequentially in two photomultiplier counting periods (see Methods). 

From the contrast of the parity fringes shown in Fig. 2b, anda 
measurement of the qubit populations before the analysis pulses!°, 
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Figure 2 | Entangling gate sequence and results. a, Gate sequence, 
showing the operations applied to the “°Ca* (upper line) and “Ca* 
(lower line) qubits, where G is the gate operation. The final state analysis 
(tomography) 1/2 pulses shown in green are optional; by scanning their 
phase ¢ we can diagnose the state produced by the gate. b, Qubit 
populations and parity signal after correcting for readout errors (see 
Methods). The individual qubit populations (open squares and inverted 
open triangles) are consistent with 1/2, as expected for the Bell state 
([LU) + |1t))/V2. The parity signal P(t (|) + P(| f}) (open circles), that is, 
the probability of the two qubits being in opposite states, should oscillate 
between 0 and 1 as sin(2¢) for a perfect Bell state. From the contrast of the 
parity signal and a measurement of the populations without the analysis 
pulses, we infer a Bell state fidelity of 99.8(6)%. The error bars show lo 
statistical errors. 


we estimate the fidelity of the Bell state produced by the gate to be 
F =99.8(6)%, where the error (0.6%) is dominated by statistical uncer- 
tainty. Known contributions to the gate error are significantly smaller?” 
than the statistical uncertainty; for example, the photon scattering error 
at the A = —1.04 THz Raman detuning used is estimated to be approx- 
imately 0.1%. Since the two qubits may be rotated independently by 
addressing them in frequency space, we can also perform full tomog- 
raphy of the entangled state and extract the density matrix (Fig. 3); the 
density matrix is consistent with that for the desired Bell state, to within 
the systematic errors from the imperfect tomography pulses, and gives a 
separate estimate of the fidelity F=99(1)%. In both cases, F represents 
the fidelity of the entangling gate operation; it excludes errors due to 
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Figure 3 | Density matrix of the mixed-isotope Bell state. a, b, The plots 
show the real (a) and imaginary (b) parts of the density matrix p, after 
correcting for qubit readout errors (see Methods). These were measured 
by rotating each qubit independently to perform full quantum state 
tomography. We used a maximum likelihood method to find the density 
matrix that best represents the experimental data. This gives a separate 
estimate of the gate fidelity, 99(1)%. 
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Table 1 | Bell/CHSH inequality test results, using the mixed-isotope 
entangled state 


0, (4Ca*) a/4. 30/4 a/4 3n/4 
Op (3Ca*) a/2 a/2 0) 0 
E02, 00) 0.565(7) 0.530(7) 0.560(7) —0.573(8) 


The qubits a and b are independently rotated through angles (0a, 04) =(n/4, 3/4) and 

(5, 06) =(x/2, 0), and for each combination of angles the correlation function E(2, Op) is 
measured, with results shown. (E is defined as in ref. 17.) The CHSH parameter is given by 
S=|E(Oa, 0) + E(04, Ob)| + |ECa, 04) — E(04, 06)| = 2.228(15) > 2, thus violating Bell’s inequality 
for this system of non-identical atoms. The state detection errors are sufficiently small 
(approximately 6%, see Methods) that it is not necessary to make a fair-sampling assumption. 
For each angle setting, 4,000 measurements were made. 


state preparation and readout, which we characterize in independent 
experiments (see Methods). 

To perform a test of the CHSH version of Bell’s inequality, we follow 
the gate sequence with further independent single-qubit rotations and 
measurements. The single-qubit rotations have constant phase @ but 
varying rotation angle ?. From these measurements we determine the 
two-particle correlation functions with results shown in Table 1. As is 
well known, the maximal CHSH parameter S allowed by local hidden- 
variable theories is 2, whereas quantum mechanics allows S < 2/2. 
In order to avoid having to make a fair-sampling assumption, we do 
not correct for qubit readout errors in these experiments. The finite 
detection error then limits the CHSH parameter to a detectable maxi- 
mum Smax= 2.236(7) for a perfect Bell state; our results give 
§ = 2.228(15), consistent with S,,,x to within the stated uncertainties, 
and violating the CHSH inequality by approximately 15. 

The mixed-species quantum logic gate that we have demonstrated 
has allowed us to create a novel entangled state, leading to the first test 
of a Bell inequality violation between isolated non-identical atoms. As 
an application, the two isotopes used here could be employed for scal- 
able quantum computing architectures based on trapped ions; hyper- 
fine qubits in “*Ca” at present constitute the best single-qubit memories 
(coherence time T’5 + 1min)°, whereas the simpler atomic structure 
of *°Cat is well suited for use as a ‘photonic interconnect’ qubit!®. There 
are technical advantages to using ions of similar mass for sympathetic 
cooling and ion transport in multi-zone traps. However, while the rel- 
atively small isotope shifts (~1 GHz) allow the convenient use of the 
same laser systems for manipulation of both species, they may provide 
insufficient protection of qubits from stray resonant light unless 
tightly focused beams are used'®”®. Therefore in the long term it 
may be necessary to use different atomic elements’. The gate mecha- 
nism employed here is independent of the qubit frequency and 
thus can also be used to couple qubits stored in different elements, 
provided that the Raman laser fields exert sufficient force on 
both qubits. We note that Cat and Sr* ions are an attractive choice in 
this respect: the 4S, /2 <> 4P)/ transition in Ca” is separated from the 
4S) /2 <> 4P3/2 transition in Sr* by 20 THz. A Raman laser detuning of 
A=—8THz (comparable to that used in our recent “Cat-8Cat two- 
qubit gate experiments”') would enable the implementation of a 
mixed-species logic gate with a photon-scattering error of ~10~“, sub- 
stantially below the error threshold for fault-tolerant operations”. 

Similar experiments using trapped-ion qubits stored in two different 
elements (°Be* and *°Mg") have recently been carried out in the NIST 
Ion Storage Group*’. We note that after the submission of the present 
manuscript, a CHSH-Bell test that closes both detection and locality 
loopholes, using heralded entanglement of remote electron spins, was 
reported?!, 

Online Content Methods, along with any additional Extended Data display items and 
Source Data, are available in the online version of the paper; references unique to 
these sections appear only in the online paper. 
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METHODS 


Ion crystal order. The “°Cat—“?Ca* ion crystal ordering is kept constant during 
the experiments to control systematic errors. The principal error that would arise 
if the ion order were not controlled is due to an (undesired) axial magnetic field 
gradient that causes the magnetic field between the two ions to differ by 0.18 :T. 
This means that the qubit frequencies for the two possible ion orders differ by 
approximately 5 kHz, which would lead to errors in single-qubit rotations. We 
measure the frequency of each qubit using slow (typically 1001s) carrier 7 pulses, 
interleaved with the main experimental pulse sequence, which allows us to detect 
and to correct for both common-mode qubit frequency changes (due to drift in 
the global magnetic field B) and differential changes (due to incorrect ion crystal 
ordering). If the ion order is wrong, we randomly reorder the crystal until the order 
is correct with a short period of Doppler heating to melt the crystal, followed by a 
short period of Doppler cooling. 

Single-qubit phases and light shifts. Despite the qubits having very different 
frequencies, no special phase control is needed to implement the entangling gate. 
The “Ca* qubit phase is tracked by the microwave local oscillator, and the “°Ca* 
qubit phase is tracked by the difference phase of the co-propagating Raman beams, 
in turn referenced to a radio-frequency local oscillator. The phases of the Raman 
beams that implement the entangling gate have no relationship to either of the qubit 
phases. However, the travelling standing wave resulting from the interference of 
the Raman gate beams also generates an isotope-dependent differential light shift 
on each qubit with an amplitude that oscillates at the Raman difference frequency 
6. Over the course of the gate operation this light shift adds phase shifts to the 
qubits that depend on the (uncontrolled) optical phase difference of the Raman 
beams. These uncontrolled phase shifts reduce the fidelity of the gate operation. 
We greatly reduce this light shift error by shaping the turn-on and turn-off of the 
Raman laser intensities with a characteristic time of 1 1s; we estimate that without 
this pulse-shaping the light shift would lead to an average gate error of up to 5% 
(see ref. 27). 

We adjust the polarization of each Raman beam individually to null the differ- 
ential light shift from each single beam on the “°Ca* qubit. (The interference of the 
two gate beams nevertheless gives rise to a polarization modulation which provides 
the state-dependent force.) Owing to the difference in atomic structure there is a 
residual light shift on the **Ca* qubit of approximately 0.2% of the light shift for a 
purely circularly polarized beam of the same intensity and frequency. This small 
light shift does not cause any significant issues in the experiments reported here; 
if necessary it could be suppressed further by increasing the Raman detuning at 
the expense of requiring more Raman beam power. 
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State preparation and measurement errors. To perform individual single-shot 
qubit readout, we selectively shelve one qubit state of each ion to the 3Ds/ level, 
then apply the Doppler cooling lasers sequentially in time first for one isotope, 
then for the other. If an ion was not shelved it fluoresces, and this is detected with 
a photomultiplier. We simultaneously shelve the two isotopes using a weak 393 nm 
beam resonant with the “Cat ‘cia o 4P33° transition, with a 1.94 GHz 
sideband (produced by an electro-optic modulator) which drives the 4°Cat 
4st) 2G 4pt3/ ? transition. An intense 850 nm beam resonant with the “°Ca* 
3D3/2 «> 4P3/2 transition makes the shelving for this isotope state-selective, via 
electromagnetically induced transparency”. The °Ca* shelving is state-selective 
owing to the 3.2 GHz splitting between the two qubit states**. Both these shelving 
processes have a maximum theoretical efficiency of ~90% due to leakage to 3D3,2 
(which for Cat could be eliminated using a further 850 nm beam if required**), 
leading to readout errors of € + 5% when averaged over both qubit states. From 
independent experiments (similar to those we describe in ref. 20), we estimate the 
state-preparation error to be approximately 0.1%, which is negligible compared 
with the readout error. 

We measure the readout errors for each qubit state of each isotope, by preparing 
and measuring each state typically 10,000 times. Since the qubits are measured 
individually, it is then straightforward to calculate the linear mapping that corrects 
for the readout errors, provided that they remain constant. The readout errors 
relevant to the entangling gate experiment (Fig. 2) were measured to be 
E49 = 7.7(2)% for Cat and E43 = 4.4(2)% for “Cat (averaged over both qubit 
states). Measurements of the readout errors were interleaved with the gate experi- 
mental runs, to check for systematic drifts, and were made using the mixed-isotope 
crystal, to avoid systematic effects associated with ion position. We estimate the 
systematic uncertainty in determining the readout errors to be approximately 0.1%, 
less than the statistical error in these measurements. If we did not correct for 
readout errors, the apparent infidelity in the Bell state would increase by approxi- 
mately > (E40 + £43) ~18%. For the CHSH test, we do not correct for readout 
errors, but we nevertheless measure them in order to calculate the maximum attain- 
able CHSH parameter (Sax). 

Sample size. No statistical methods were used to predetermine sample size. 


32. McDonnell, M. J. et al. High-efficiency detection of a single quantum of angular 
momentum by suppression of optical pumping. Phys. Rev. Lett. 93, 153601 
(2004). 

33. Myerson, A. H. et al. High-fidelity readout of trapped-ion qubits. Phys. Rev. Lett. 
100, 200502 (2008). 
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Radiative heat transfer in the extreme near field 


Kyeongtae Kim!+*, Bai Song!*, Victor Fernandez-Hurtado**, Woochul Lee!, Wonho Jeong!, Longji Cui!, Dakotah Thompson!, 
Johannes Feist”, M. T. Homer Reid®, Francisco J. Garcia-Vidal**, Juan Carlos Cuevas’, Edgar Meyhofer! & Pramod Reddy! 


Radiative transfer of energy at the nanometre length scale is 
of great importance to a variety of technologies including heat- 
assisted magnetic recording!, near-field thermophotovoltaics” 
and lithography’. Although experimental advances have enabled 
elucidation of near-field radiative heat transfer in gaps as small 
as 20-30 nanometres (refs 4-6), quantitative analysis in the 
extreme near field (less than 10 nanometres) has been greatly 
limited by experimental challenges. Moreover, the results of 
pioneering measurements”* differed from theoretical predictions 
by orders of magnitude. Here we use custom-fabricated scanning 
probes with embedded thermocouples””, in conjunction with 
new microdevices capable of periodic temperature modulation, 
to measure radiative heat transfer down to gaps as small as 
two nanometres. For our experiments we deposited suitably chosen 
metal or dielectric layers on the scanning probes and microdevices, 
enabling direct study of extreme near-field radiation between 
silica-silica, silicon nitride-silicon nitride and gold—gold surfaces 
to reveal marked, gap-size-dependent enhancements of radiative 
heat transfer. Furthermore, our state-of-the-art calculations of 
radiative heat transfer, performed within the theoretical framework 
of fluctuational electrodynamics, are in excellent agreement with 
our experimental results, providing unambiguous evidence that 
confirms the validity of this theory!!! for modelling radiative heat 
transfer in gaps as small as a few nanometres. This work lays the 
foundations required for the rational design of novel technologies 
that leverage nanoscale radiative heat transfer. 

Radiative heat transfer in the far field", that is, at gap sizes larger 
than Wien’s wavelength (~10|1m at room temperature), is well estab- 
lished. However, near-field radiative heat transfer (NFRHT), where 
the gap sizes are smaller than Wien’s wavelength, remains relatively 
unexplored!>. Over the past decade, a series of technical advances 
have enabled experiments*® for gap sizes as small as 20 nm to study 
NFRHT and broadly verify the validity of a theoretical framework 
called fluctuational electrodynamics!!!°!8 for modelling NFRHT. In 
contrast, recent experiments”* of extreme (e) NFRHT with single-digit 
nanometre gap sizes (<10nm) between gold (Au) surfaces have ques- 
tioned the validity of fluctuational electrodynamics and have raised the 
question of whether additional mechanisms, even of non-radiative ori- 
gin such as phonon tunnelling’’, could dominate the heat transfer in 
this regime. In addition, some newer computational eNFRHT studies” 
on dielectrics have suggested that the local form of fluctuational elec- 
trodynamics, in which one assumes the dielectric properties of the 
media to be local in space, is inadequate for modelling eNFRHT. Yet 
other computations”! on dielectrics have asserted that such non-local 
effects are irrelevant even for gap sizes as small as 1 nm. This disagree- 
ment is of great concern because understanding eNFRHT is critical for 
the development of a range of novel technologies'~*. Here, we present 
experimental and computational results that both demonstrate 
marked increases in heat fluxes in the extreme near field and establish 


the validity of fluctuational electrodynamics for modelling/predicting 
eNFRHT for dielectric as well as metal surfaces in gap sizes as small 
as a few nanometres. 

Experimental elucidation of radiative heat transfer across few- 
nanometre-sized gaps is exceedingly difficult, owing to numerous 
technical challenges in creating and stably maintaining such gaps 
while simultaneously measuring minute (pW) heat currents across 
them. One key innovation used in this work to overcome the technical 
challenges was to leverage highly sensitive, custom-fabricated probes 
with embedded Au-Cr thermocouples (Fig. la—c), called scanning 
thermal microscopy (SThM) probes’. The SThM probes were fabri- 
cated by deposition of multiple metal and dielectric layers to create a 
nanoscopically small Au-Cr thermocouple at the very end of the tip. 
Our probes were optimized to have both a high thermal resistance” 
(Rp = 10°K W~!) and stiffness? (>4Nm_7!), and were coated with a 
desired dielectric (silica (SiO) or silicon nitride (SiN)) or metal (Au) 
layer. The resulting probes have tip diameters ranging from 350 nm to 
900 nm (for details see Fig. 1b and Supplementary Figs 1-3). 

The basic strategy for quantifying NFRHT is to record the tip tem- 
perature, via the embedded nanoscale thermocouple, which rises 
in proportion to the radiative heat flow when the tip is displaced 
towards a heated substrate. To eliminate conductive and convective 
heat transfer and to remove any water adsorbed to the surfaces, all 
measurements were performed in an ultra-high vacuum (UHV) using 
a modified scanning probe microscope (RHK UHV 7500) housed 
in an ultra-low-noise facility (see Supplementary Information). In 
performing the measurements, the substrate is heated to an elevated 
temperature (Ts = 425 K) while the SThM probe, mounted in the 
scanner of the scanning probe microscope, is connected to a ther- 
mal reservoir maintained at a temperature Tp = 310K. The spatial 
separation between the probe and the substrate is reduced at a con- 
stant rate of 0.5nms~! froma gap size of 50 nm until probe-substrate 
contact. During this process the temperature difference between 
the tip (Tp) and the reservoir (Tp), ATp = Tp — Ta, is monitored 
(see Supplementary Information) via the embedded thermocouple, 
while the deflection of the cantilever is concurrently measured opti- 
cally via an incident laser (Fig. 1a). 

A typical deflection trace for a SiO2-coated tip approaching a SiO,- 
coated surface is shown in Fig. 2a. From the deflection trace it is appar- 
ent that the gap size can be controllably reduced to values as small 
as ~2 nm, below which the tip rapidly ‘snaps’ towards the substrate 
and makes contact (see Supplementary Information). This instability 
is created by attractive forces between the tip and the substrate that 
arise owing to Casimir and/or electrostatic forces. Figure 2a shows the 
simultaneously measured AT», which represents the sudden increase 
in temperature that occurs when the tip snaps into the substrate. This 
rapid increase in tip temperature (~2 K) upon mechanical contact 
is due to heat conduction, via the solid—solid contact, from the hot 
substrate (425 K) to the tip of the SThM probe, the temperature of 
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Figure 1 | Experimental set-up and SEM images of SThM probes and 
suspended microdevices. a, Schematic of the experimental set-up, 

in which an SThM probe is in close proximity to a heated substrate 
(insets show cross-sections of the SThM probe). The scenario for SiO, 
measurements is shown (the coating on the substrate is replaced with SiN 
and Au in other experiments). b, SEM image (top) of a SThM probe. The 
inset shows an SEM image of the hemispherical probe tip, which features 
an embedded Au-Cr thermocouple from which the thermoelectric 
voltage Vc is measured. The bottom panel illustrates a schematic cross- 
section for a SiO2-coated probe used in SiO, measurements. For SiN and 


which is ~400 K (heating by the incident laser results in an elevated 
temperature). 

The tight temporal correlation between the mechanical snap-in and 
the temperature jump of the probe makes it possible to identify tip- 
substrate contact solely on the basis of temperature signals. In Fig. 2b, 
the recorded tip temperature is shown as a probe approaches a heated 
substrate with the laser beam turned off. The recorded temperature 
signals with and without laser tracking are basically identical (Fig. 2a, b), 
except that the magnitude of the jump reflects the tip—substrate tem- 
perature difference with and without laser excitation. Thus, mechanical 
contact can be readily detected from the robust temperature jump with- 
out laser excitation, thereby avoiding probe heating and laser interfer- 
ence effects. Therefore, we performed all experiments by first estimating 
the snap-in distance using the optical scheme and subsequently turning 
the laser off to perform eNFRHT measurements (see Supplementary 
Information for the measurement of gap size and snap-in distance). 

To determine the gap (d)-dependent near-field radiative conduct- 
ance (Genrrut), we measured AT» and directly estimated Geyrrut 
from Genrrut(d) = ATp/ [Rp(Ts = TR = AT»)] > where Rp is the ther- 
mal resistance of the probe, which was experimentally determined 
as described in Supplementary Information (Supplementary Fig. 7) 
to be 1.6 x 10°K W~ and 1.3 x 10°K W for the SiO2- and SiN- 
coated probes, respectively. The measured conductance of the gaps 
for SiOz and SiN surfaces is shown in Fig. 3a and b, respectively. It can 
be seen that Gexprnr increases monotonically until the probe snaps 
into contact (gap size at snap-in is ~2nm for both SiO, and SiN 
measurements; see Supplementary Information and Supplementary 
Fig. 6). Furthermore, it can be seen that the eNFRHT is larger for 
experiments performed with SiO. These measurements represent 
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Au measurements, the outer SiO, coating is appropriately substituted 

as explained in Supplementary Information. A resistance network that 
describes the thermal resistance of the probe (Rp) and the vacuum gap 
(Rg= (Genrrut) ‘), as well as the temperatures of the substrate (Ts), 

tip (Tp) and reservoir (Tp) is also shown. c, Schematic showing the 
measurement scheme used for high-resolution eNFRHT measurements of 
Au-Au. The amplitude of the supplied sinusoidal electric current is I, the 
sinusoidal temperature oscillations at 2f are related to the voltage output 
V3». d, SEM image of the suspended microdevice featuring the central 
region coated with Au and a serpentine Pt heater-thermometer. 


the first observation of eNFRHT in single-digit nanometre-sized 
gaps between dielectric surfaces. We compared these results to our 
computational predictions based on fluctuational electrodynamics, 
assuming local-dielectric properties (see details later), and found very 
good agreement (blue lines in Fig. 3a, b). 

The remarkable agreement between eNFRHT measurements and 
computational predictions raises important questions with regards 
to recent experiments’ investigating eNFRHT between Au surfaces, 
which suggested strong disagreements (~500-fold) between predic- 
tions of fluctuational electrodynamics and the results of experiments. 
One may wonder if the good agreement reported above is unique to 
eNFRHT between polar dielectric materials. To answer this question 
unambiguously, we performed additional eNFRHT measurements 
with Au-coated probes and substrates. The measured conductance in 
these experiments is shown in Fig. 3c. It can be seen that the measured 
Genrrut With decreasing gap size remains comparable to the noise 
floor of ~220 pW K! for Au-coated probes at an applied temperature 
differential of ~115 K (see Supplementary Information) and is much 
smaller than that observed for polar dielectrics. These measurements 
set an upper bound of ~250pWK ! for Genrrut in our Au-Au exper- 
iments. This result is particularly surprising because previous studies 
that used probes with smaller diameters and lower thermal resist- 
ances” ((23-54) x 107K W_! and ~10°K W |, implying a lower sen- 
sitivity than our probes) reported conductances >40 nW K™!, which 
are at least two orders of magnitude larger than conductances meas- 
ured by us and predicted by theory. 

To resolve this contradiction we needed to improve the resolu- 
tion of our conductance measurements by more than an order of 
magnitude (see Supplementary Information and Supplementary 
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Figure 2 | Detection of mechanical contact from deflection and 
temperature signals. a, Data from an experiment in which a SiO2-coated 
probe at about 400 K (heated by the incident laser) is displaced towards 
a heated SiO substrate at 425 K. The deflection of the scanning probe 
(blue), reported in arbitrary units (a.u.), and rise in temperature of probe, 
ATp (red), are shown. The sudden decrease in the deflection signal due 
to snap-in coincides with a simultaneous increase in the tip temperature 
due to conduction of heat from the hot substrate to the cold tip, clearly 
showing that contact can be readily detected by the large temperature 
jump. The snap-in distance is seen to be ~2 nm. b, Measured ATp when 
an unheated probe (310K, laser turned off) is displaced towards the 
substrate. A sudden increase in the tip temperature is seen when the cold 
tip contacts the substrate. Inset shows the increase in the tip temperature 
due to eNFRHT. 


Fig. 8 for details). This was accomplished by using a new microdevice 
(see Fig. 1c, d and Supplementary Figs 4, 5, 9, 10 for details of 
device fabrication and characterization) that features a suspended 
island whose temperature can be readily modulated at f= 18 Hz 
(see Supplementary Information). Sinusoidal electric currents 
(9 Hz) supplied to the embedded electrical heater resulted in 
sinusoidal temperature oscillations at the second harmonic with 
amplitude (ATs -— 1s Hz) that was accurately measured using a lock-in 
technique®** (see Supplementary Information). To character- 
ize eNFRHT, we positioned a Au-coated SThM probe (30nm Au 
thickness) in close proximity to the surface of the microfabricated 
device, which features a suspended region that is 50 1m x 50p.m 
large and was coated with 100 nm of Au. The amplitude of temper- 
ature modulation of the probe (ATps=1suz), due to eNFRHT, was 
measured at various gap sizes (see Supplementary Information) in 
a bandwidth of 0.78 mHz. Given the low noise in this bandwidth 
it was possible to resolve temperature changes as small as ~20 LK, 
which corresponds to a conductance noise floor of ~6 pW K~', when 
ATs,F—18Hz is 5 K (see Supplementary Information section 7 for details 
of the noise characterization). The measured ATp,— 1s nz values were 
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used to estimate GeNFRHT (Fig. 3d) via: Genrrut(d) = ATpf— 18 H/ 
[Reau( ATs, f= 18Hz— ATpy= 18Hz)], where Rpau=0.7 x 10°K W71 is 
the thermal resistance of the Au-coated probe (see Supplementary 
Information and Supplementary Fig. 7). The smallest gap size at which 
measurements could be accomplished is ~3 nm and is limited by both 
snap-in and deflections of the microdevice due to periodic thermal 
expansion resulting from bimaterial effects (see Supplementary 
Fig. 11). The measured Geyprnr (Fig. 3d) is indeed much smaller than 
that obtained with SiO, (Fig. 3a) and SiN (Fig. 3b) films. In contrast 
to previous experiments’, our measured Genrrnr for Au-Au surfaces 
is in excellent agreement with the predictions of fluctuational elec- 
trodynamics (solid line in Fig. 3d). 

To obtain insight into our experimental results, we used a fluc- 
tuating-surface-current formulation of the radiative heat transfer 
problem!3° combined with the boundary element method, as imple- 
mented by us in the SCUFF-EM solver”®. This allows NFRHT calcu- 
lations between bodies of arbitrary shape and provides numerically 
exact results within the framework of fluctuational electrodynamics 
in the local approximation!**. For our calculations, we character- 
ized the dielectric function for SiN, whereas the dielectric functions 
for SiO, and Au were taken from previous work (see Supplementary 
Information section 12 and Supplementary Fig. 12). To simulate our 
experiments accurately, we considered the tip-substrate geometries 
shown in the left insets of Fig. 4c, d. Here, the tip has a conical shape 
and ends ina spherical cap whose radius was obtained from scanning 
electron microscope (SEM) images of the probes (see Supplementary 
Figs 1-3). In our simulations, we included sufficiently large areas of 
the probe's conical part and the substrate such that the results do not 
depend on their finite size (see Supplementary Information section 14 
and Supplementary Fig. 13). To maintain high fidelity to the experi- 
mental conditions, we also accounted for the small roughness of our 
probes by including random Gaussian-correlated noise in the tip pro- 
file (Fig. 4c, d). More precisely, the maximum protrusion height on 
the tip and the correlation length between protrusions were chosen to 
be 10nm and 17 nm, respectively, on the basis of the surface charac- 
teristics observed in the SEM images (Supplementary Figs 1-3). We 
investigated the effect of surface roughness by computing Gengrur for 
every material from 15 different tip-substrate ensembles with rough- 
ness profiles generated as described earlier. The computational results 
for the different materials are presented in Fig. 3a, b, d. As pointed out 
earlier, we indeed find very good agreement between computation and 
experiment without any adjustable parameters. 

To elucidate the underlying physical mechanism and explain the 
differences in eNFRHT between different material combinations, we 
computed the spectral conductance (heat conductance per unit of 
energy) for several gap sizes as shown in Fig. 4a, b for SiOz and Au, 
respectively (see Supplementary Fig. 14 for SiN results). In Fig. 4a, one 
can see that the dominant contributions to the spectral conductance of 
SiO, come from two narrow energy ranges centred around ~0.06 eV 
and ~0.14 eV, which correspond to the energies of the transverse opti- 
cal phonons of SiO». This strongly suggests that for SiO2, eNFRHT is 
dominated by surface phonon polaritons (SPhPs), as previously found 
for larger gaps®””*®. In turn, this explains the marked decrease in heat 
transfer as the gap size increases, which is a consequence of the rapid 
decrease in the number of available surface electromagnetic modes 
for radiation to tunnel across the vacuum gap. In contrast, eNFRHT 
for Au exhibits a rather broad spectral conductance that decays more 
slowly with gap size (Fig. 4b). This slow decay is reminiscent of the 
situation encountered in a plate-plate geometry”? where NFRHT is 
dominated by frustrated internal reflection modes, that is, by modes 
that are evanescent in the vacuum gap but are propagating inside the 
Au tip and substrate whose contribution saturates for gaps below 
the skin depth”, which for Au is around 25 nm. This naturally explains 
the weaker dependence of eNFRHT on gap size observed in our Au-Au 
measurements. The fundamental difference in eNFRHT between die- 
lectrics and metals is also apparent from the computed Poynting-flux 
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Figure 3 | Measured extreme near-field thermal conductances for dielectric 
and metal surfaces. a, Measured near-field radiative conductance between 
a SiO-coated probe (310K) and a SiO; substrate at 425 K. The red solid line 
shows the average conductance from 15 independent measurements, the 
light red band represents the standard deviation. The blue solid line shows 
the average of the computed radiative conductance for 15 different tips with 
stochastically chosen roughness profiles (root-mean-squared roughness of 
~10nm) and a tip diameter (450 nm) obtained from SEM images of the probe. 
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Figure 4 | Spectral conductance and spatial distribution of the 
Poynting flux. a, Spectral conductance as a function of energy for a SiO» 
tip-substrate geometry for three different gap sizes. The tip diameter is 
450 nm, and the reservoir temperatures are 310 K for the tip and 425 K for 
the substrate. Notice the logarithmic scale in the vertical axis. b, Same as 
a, but for Au. In this case, the tip radius is 450 nm, and the tip and substrate 
temperatures are 300 K and 301 K, respectively. c, Surface-contour plot 
showing the spatial distribution of the Poynting-flux pattern on the 
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for Au-Au. d, Near-field conductance from experiments with a Au-coated 
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patterns on the surfaces (Fig. 4c, d), which show that eNFRHT in 
the SiO2 case is much more concentrated in the tip apex than it is in 
the Au case. This difference reflects the fact that in a polar dielectric, 
such as SiOz, eNFRHT has a very strong distance dependence due to 
the excitation of SPhPs with a penetration depth comparable to the 
gap size®. Given these differences between metals and dielectrics, it is 
not surprising that Au-Au eNFRHT is relatively insensitive to small 
surface roughness (see Supplementary Fig. 15). For this reason, the 
large differences between our results for Au and those of previous 
work’®, which disagree with the predictions of fluctuational electro- 
dynamics, cannot be attributed to differences in the surface roughness. 
Our computational results, when compared with our experimental 
data, provide unambiguous evidence that fluctuational electrodynam- 
ics accurately describes eNFRHT. 

We note that the results presented here provide the first experimen- 
tal evidence—to our knowledge—for extremely large enhancements of 
radiative heat transfer in the extreme near field between both dielectric 
and metal surfaces. Furthermore, our results establish the fundamen- 
tal validity of fluctuational electrodynamics in modelling eNFRHT 
and NFRHT. The technical advances described in this work are key 
to systematically investigating eNFRHT phenomena in a variety of 
materials and nanostructures, and provide critical information that 
complements insights that can be obtained by other near-field tech- 
niques**'. Knowledge gained from such studies will be critical to the 
development of future technologies that leverage nanoscale radiative 
heat transfer*”. 
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Self-shaping of oil droplets via the formation of 
intermediate rotator phases upon cooling 


Nikolai Denkov!, Slavka Tcholakova!, Ivan Lesov!, Diana Cholakova! & Stoyan K. Smoukov2 


Revealing the chemical and physical mechanisms underlying 
symmetry breaking and shape transformations is key to 
understanding morphogenesis’. If we are to synthesize artificial 
structures with similar control and complexity to biological systems, 
we need energy- and material-efficient bottom-up processes to 
create building blocks of various shapes that can further assemble 
into hierarchical structures. Lithographic top-down processing” 
allows a high level of structural control in microparticle production 
but at the expense of limited productivity. Conversely, bottom-up 
particle syntheses** have higher material and energy efficiency, 
but are more limited in the shapes achievable. Linear hydrocarbons 
are known to pass through a series of metastable plastic rotator 
phases before freezing”!°. Here we show that by using appropriate 
cooling protocols, we can harness these phase transitions to control 
the deformation of liquid hydrocarbon droplets and then freeze 
them into solid particles, permanently preserving their shape. Upon 
cooling, the droplets spontaneously break their shape symmetry 
several times, morphing through a series of complex regular 
shapes owing to the internal phase-transition processes. In this 
way we produce particles including micrometre-sized octahedra, 
various polygonal platelets, O-shapes, and fibres of submicrometre 
diameter, which can be selectively frozen into the corresponding 
solid particles. This mechanism offers insights into achieving 
complex morphogenesis from a system with a minimal number of 
molecular components. 

We illustrate the capabilities of this new approach by using droplets 
of different linear hydrocarbons with 14-20 carbon atoms (namely 
from tetradecane to eicosane). The alkanes were pre-dispersed as 
droplets in 1.5 wt% aqueous surfactant solution, which can after- 
wards be transformed into a variety of solid particles with different 
shapes through the choice of appropriate surfactants and controlled 
cooling rates (Fig. 1 and Extended Data Fig. 1). Figure 2 shows how 
the choice of surfactant can influence the particle shape and aspect 
ratio. Upon freezing, many of the thin, high-aspect-ratio structures 
obtained develop a puncture hole in their interior, owing to the volu- 
metric shrinkage accompanying solidification. 

Our experiments show that the drop-shape transformations and 
the final shape of the frozen particles depend most on three factors: 
surfactant type, cooling rate, and initial droplet size. We outline the 
main effects of these control factors and, afterwards, we explain the 
basic mechanism of the drop shape evolution. 

Surfactants are amphiphilic molecules with a hydrophilic head- 
group (ionic or non-ionic) and hydrophobic alkyl chain. While only 
two of our surfactants were ultra-pure (C;sH33N(CH3;)3Br (CTAB) and 
C4H29SO,4Na with purity >99%), our experiments with more than 
ten surfactants of all types (anionic, cationic and non-ionic) showed 
the same general sequence of shape transformations (Fig. 1). These 
transformations occurred only when the surfactant chain length was 
similar to or longer than the length of the hydrocarbon molecules in 
the droplets. Such long-chain surfactants can freeze in the adsorption 


layer on the drop-water interface, before the freezing of the alkane in 
the droplet interior!®, and thus have a critical role in the formation of 
drops with non-spherical shapes. The use of surfactants with shorter 
chain lengths led to drops freezing into spherical solid particles, with- 
out any peculiar shape transformations. 

The rate of cooling is another crucial factor in the observed phenom- 
enon. Upon slow and moderate cooling rates (below about 4K min n). 
the spherical hydrocarbon drops undergo a series of shape transforma- 
tions. Figure 1 illustrates the case of hexadecane drops in water con- 
taining 1.5 wt% of the non-ionic surfactant CjgH33(CH2CH20)29)0H 
(Brij 58). Initially, the spherical drops transform into regular octahedra 
(regular polyhedra whose surface is shaped by eight triangular facets) 
that then transform into flat platelets with a hexagonal base. Upon 
further cooling, these hexagons transform either into triangular or 
tetragonal platelets, the ratio of which depends on the surfactant, the 
cooling rate, and the initial size of the droplets. Subsequently, rod-like 
asperities with diameters of around 5 1m appear and grow into long 
filaments from the platelet tips. Finally, if the cooling is sufficiently 
slow (less than 0.5K min~1), these asperities elongate further to form 
very thin fibres with diameters of around 0.5\1m. When the cooling 
rates were varied between 0.01 and 2 K min™!, each transformation 
took between 30s and several minutes (see Supplementary Video 1). 
Depending on the cooling rate, the drops freezing into solid particles 
occurred at different stages of this evolution path—slower cooling led 
to freezing at a later stage. Thus, using an appropriate intermediate 
cooling rate, we could transform an intermediate drop shape into a 
solid frozen particle with the same shape (Figs 1 and 2; alternatively, 
one can apply step-acceleration cooling to freeze the deformed drops). 
For example, using Brij 58 as surfactant, at 0.2 Kmin™! cooling with 
30m droplets, we obtained 25 + 5% triangles and 75 + 5% rhomboids 
that evolve into rod-shape particles and then finally into fibres, as 
determined from observations of over 100 droplets in more than 10 
independent experiments. For comparison, when using 101m droplets 
with Tween 60, at 0.2 Kmin™! cooling, we could yield more than 90% 
rod-shape particles. 

Drop size was another important factor. The images shown in Figs 1 
and 2 are obtained with drops of initial diameter around 201m. Very 
similar results were obtained with drops of diameter between 1 and 
50pm. At low cooling rates (0.01-2 K min7'), both small and large 
drops evolved in shape when appropriate surfactants were used. 
However, at higher cooling rates the big drops tended to freeze into 
spherical solid particles without shape transformations, while the 
smaller drops readily evolved in shape. Thus we could induce shape 
transformations of drops with diameter 1-50|1m in a wide range of 
cooling rates (0.01-2 Kmin~!). Note that the small droplets with 
micrometre size are involved in intensive Brownian motion that, how- 
ever, does not suppress the shape transformations. The investigation 
of submicrometre droplets was postponed for a separate study because 
the observation of their evolution requires much more sophisticated 
experimental methods. 
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Figure 1 | Schematic of the shape transformations observed during 
cooling of emulsion droplets of pure hydrocarbons in water, in the 
presence of 1.5 wt% surfactant. a, Under slow cooling the spheres 
transform consecutively into regular octahedra, hexagonal platelets, 
triangular (or tetragonal) platelets, platelets with long asperities and, 


The observed numerous shape transformations are surprisingly gov- 
erned by a single mechanism. It was deduced from the experimental 
observations described earlier as follows. 

The requirement for a long surfactant chain-length indicates that the 
shape transformations are triggered by the freezing surfactant adsorp- 
tion layers, formed on the hydrocarbon-water interface. Numerical 
estimates described later demonstrate that these adsorption layers do 
not possess a sufficiently high bending moment to deform the hydro- 
carbon drops. Therefore, before freezing, the formation of mesomor- 
phic hydrocarbon phases, just inside the surface of the liquid drops, 
is required to trigger the observed transformations. Indeed, the drop- 
shape transformations start around the freezing temperature of the 
bulk hydrocarbon phase transition (18°C for hexadecane). In this 
temperature range, linear hydrocarbons form so-called ‘rotator’ mes- 
omorphic phases, which represent a class of plastic phases in which 
the molecules possess long-range translational order, yet rotate freely 
around their long axis”!°. Owing to their positional long-range order, 
the rotator phases generate non-isotropic elastic stresses, which are 


a b c 


Figure 2 | Choice of surfactant and cooling rate can determine particle 
shape and aspect ratio. a~e, Microscope images of deformed liquid 
droplets (top) and of the solid particles obtained from these droplets 
(bottom), upon cooling of hexadecane-in-water emulsions, stabilized by 
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eventually, thin fibres. b, Under moderate cooling the droplet surface 
corrugates and forms a hexagonal platelet with protrusions, before 
forming the regular hexagonal platelet. c, Hexadecane liquid droplets in 
Brij 58 solution at different stages. d, Solid particles with different shapes, 
obtained at appropriate cooling rates. Scale bars, 20 j1m. 


sufficiently high to deform liquid drops, overcoming their interfacial 
tension". 

Images of deformed drops clearly illustrate the elastic nature of the 
fluid material confined inside the drops (Fig. 3b and Supplementary 
Video 2). For non-elastic materials, the long filaments growing from 
the tips of the triangular platelet shown in Fig. 3a would be unsta- 
ble and should break into small spherical droplets through so-called 
‘Plateau-Rayleigh capillary instability, under the action of capillary 
pressure, which destabilizes cylindrical liquid jets'”. The angular 
doughnut-shaped droplets, shown in Fig. 3b, would also be unstable 
in shape, unless the drop material possesses elasticity to counteract 
the capillary pressure, forcing the common liquid drops to acquire a 
spherical shape. 

The simplest possible explanation of the observed non-spherical 
drops could be that the rotator phases, structured inside the entire 
volume of the drop interior, create high elastic stresses that deform 
the drop surface. However, some of our results contradict this simplest 
explanation. For example, the optical microscopy observations of the 
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various surfactants: a, b, Tween 60; c, d, Tween 40; e, Brij 78. The numbers 
indicate the aspect ratio. The aspect ratio of the various shapes along the 
drop evolution (compare with Fig. 1) depends on the specific surfactant 
used to stabilize the drops. Scale bars, 201m. 
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Figure 3 | Illustrative examples of liquid drop shapes that would be 
unstable in the absence of elastic properties of the fluid-drop material. 
a, Image of a triangular platelet with very long cylindrical asperities, 
protruding from the platelet tips. b, Image of rhomboid-shaped liquid 
drops, some with a hole in their centre, a shape clearly not governed by 
surface tension only. Such images prove that the fluid drops contain plastic 
phases, while still containing liquid inside (the oval shape in the puncture 
determined by surface tension). The coloured, bottom-most shape is the 
only one frozen. Scale bars, 50,1m. 


liquid droplets of various shapes under cross-polarized light showed 
only faint colours, in contrast to the characteristic intense colours that 
are typically observed for thick non-isotropic liquid crystal layers!’. If 
the interior of the deformed liquid drops were entirely filled with an 
anisotropic plastic phase, the liquid drops would have appeared more 
coloured in cross-polarized light. We do observe the appearance of 
beautiful intense colours, but only in the moment of complete freez- 
ing of the droplets, indicating the formation of crystal domains in the 
frozen particles (Figs 1 and 2). 

From these results we concluded that the freezing surfactant 
adsorption layer induces the formation of a thin layer of a hydro- 
carbon plastic rotator phase of thickness hp,, adjacent to the drop 
surface, which in turn, drives the observed drop-shape transfor- 
mations (Fig. 4). Surface-induced formation of liquid crystal phase 
layers has been observed previously!*!° and, as shown later, such 
plastic interfacial sheets possess a sufficiently large bending moment, 
able to counteract the effects of the hydrocarbon-water interfacial 
tension that enforces the spherical shape of the common liquid 
drops. Being submicrometre in thickness, these sheets appear only 
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Figure 4 | Schematic presentation of the drop-shape deformation 
mechanism, driven by the formation of interfacial plastic phases. 

a-c, Cross-sections of a platelet-shaped particle with protrusions (a) 

are shown in b, c. In the presence of appropriate surfactant, thin plastic 
phases (brown regions) with thickness hp,, bending elasticity constant Kg 
and characteristic curvature of the shaped droplet edges, r, form at the 
hydrocarbon-water interface, adjacent to the surfaces with high curvature. 
The low-temperature-induced, highly curved plastic phases form an 
energetically favourable expanding frame at the drop edges, which drives 
the observed shape transformations. For clarity, the figure is not to scale. 
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with faint colours in cross-polarized light’, just as observed in our 
experiments. 

The thickness of these plastic interfacial sheets, hp, could be esti- 
mated by considering the balance of the bending and surface area 
energies of the drop surface. The surface area energy is represented 
by the interfacial tension, 7, which was measured to be between 5 and 
10m] m * for the surfactant-containing systems studied. The bend- 
ing energy per unit area of the hydrocarbon-water interface could be 
estimated!”!8 as Ep = Kp/1’, where Ky is the bending elasticity constant 
and r= 1m is the observed characteristic curvature of the shaped 
droplet edges. Such stable, highly deformed droplets could be formed 
only if the highly curved phases are energetically favourable, and if 
Ep is comparable or bigger than the surface tension y (both measured 
in J m’), which would pull the surface back to the lower spherical 
curvature. This leads to the minimum estimate for Kp 107!" J, a 
value much higher than the known!”!® bending constants of frozen 
lipid bilayers or surfactant adsorption monolayers, Kg ~ 107!8 J. 
Taking into account! that Kg is proportional to hp,2, we estimate that 
hp, © 300 nm for the observed deformed drops. Similar values of hp, 
were reported in independent experimental studies'*!* for surface- 
induced liquid crystal sheet phases. As already explained, the bending 
forces resulting from h + 1.5-3nm for surfactant monolayers and lipid 
bilayers!”!8, and the respective bending moment of Kg = 10718), are 
far too weak to deform liquid drops for the interfacial tension values 
measured in our systems. 

The elastic layers seem mostly localized at the shape edges and are 
characterized also by their ‘spontaneous curvature !”!°, that is, the cur- 
vature that the interface would acquire if no other forces (besides the 
local intermolecular forces) were involved. All our observations show 
that the shape transformations are driven by the growth of edges with 
high spontaneous curvature (small radius of curvature) in one direc- 
tion, thus forming cylindrical structures in the drop edges. These ener- 
getically favourable, cylindrical plastic crystal phases grow in length, 
thus forming elastic frames that overcome the interfacial tension and 
stretch the droplets to flatter shapes with high aspect ratios. Indeed, 
as illustrated in Fig. 3 and Supplementary Video 2, upon puncture, 
the inside of the rhomboid liquid droplets acquire the minimum cir- 
cumference dictated by interfacial tension, while preserving intact the 
outside shaped frame. Figure 1 and Supplementary Video 1 show how 
the plastic crystals grow and disproportionate into different straight 
edges to form droplet shapes with longer and longer circumferences, 
until finally forming highly elongated drops and thin fibres of radius 
less than 0.4 1m. Such fibres contain all the material in a single ‘edge’ 
region, probably all composed of a rotator phase and providing an 
estimate for the spontaneous curvature of these phases. 

Our observations with droplets of several linear hydrocarbons show 
that no specially designed molecules are needed to observe this drop 
‘self-shaping’ phenomenon. By selecting hydrocarbons with appropri- 
ate chain length, we varied the temperature range for the liquid drop 
transformations: the tetradecane drops had non-spherical shapes in 
a range of 0 to 3°C, hexadecane drops in a range of 9 to 18°C, and 
eicosane drops in a range of 30 to 35°C. 

The growth of smectic liquid crystalline fibres in CTAB solu- 
tions above the critical micellar concentration has previously been 
observed”!, However, in these papers only one type of structure was 
observed (rod-shaped particles) whereas we are able to produce par- 
ticles with a wide variety of shapes in a controlled manner. The drop- 
shape sequence of transformations we observe is much richer and 
probably different in mechanism from those observed previously”?”!. 
Also, hexagonal drop shapes of lyotropic liquid crystal phases have 
been observed before'!. However, the full array of complex shapes 
shown in Figs 1 and 2, and the possibility of transforming between 
them and of capturing them in a frozen state, are novel (see also 
Extended Data Figs 1 and 2). 

Our approach can be used to produce ‘shape-on-demand’ particles, 
noting that high-aspect-ratio micro/nanoparticles show preferential 
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internalization in tumour cells” and that tissue/organ uptake can be 
shape specific”*. A combination with microfluidic techniques®**”° 
seems particularly suitable to explore the full range of such opportu- 
nities and the self-shaping method’s governing mechanism of sym- 
metry breaking. By controlling the local temperature profile in the 
microfluidic channel, custom shaped particle populations or mixtures 
of specific shapes and various sizes could be produced. The obtained 
shaped particles and fibres could be used to build hierarchical struc- 
tures or as sacrificial templates for the production of porous materials 
with complex morphologies. 

We report a novel bottom-up mechanism for morphogenesis and 
an energy- and material-efficient method for the formation of micro- 
and nanoscale liquid drops and solid particles with complex shapes. 
The ability of a single fluid phase to form spontaneously the wide 
variety of shapes we report could decrease the perceived informa- 
tional complexity of many structures”°. This shape-shifting is probably 
used in nature, it is of clear relevance to the emerging field of active 
matter, and is expected to be applicable to other rotator-phase- and 
plastic-phase-forming molecules and biomolecules. 

The morphogenesis mechanism we present is expected to stimulate 
research in a number of fields, as the observed phenomena combine 
several active research areas, such as capillarity and elasticity, liquid 
crystal and plastic phases in confined spaces, and surface and bulk 
nucleation. The process is probably a good platform for investigat- 
ing phase equilibria, the role of confinement and the melting of two- 
dimensional crystals, as well as the interplay between liquid crystal 
defects and surface bending elasticity resulting in shape changes”””®. It 
is of particular interest for elucidating novel mechanisms of symmetry 
breaking that contribute to understanding the fundamental processes 
of morphogenesis. 
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METHODS 

The alkanes (>99% purity) were purchased from Sigma-Aldrich, and further 
purified by passing several times through columns of Florisil to remove the polar 
components. In the absence of surfactant, hydrocarbon-water interfacial tension 
was measured to be always above 50 mN m7! as known for pure alkanes. The 
hydrocarbon-water interfacial tension (water containing 1.5 wt% surfactant), 7, 
was measured by drop-shape analysis (instrument DSA100 by Kriiss) to be in the 
range between 5 and 10mN m_’ for all surfactant systems studied, in the entire 
range from room temperature down to the temperatures of drop deformation 
and freezing. 

The original hydrocarbon-in-water emulsions were prepared by membrane 
emulsification with 2, 3, 5 or 10j1m pore-size glass membrane (SPG) in 1.5 wt% 
solutions of Brij 58 (Fig. 1), Tween 40, Tween 60 or Brij 78 (Fig. 2 and Extended 
Data Fig. 1), CTAB (Extended Data Fig. 1) or other surfactants (images not shown). 
All surfactants were chosen to be water soluble with high hydrophilic-lipophilic 
balance (HLB > 14), so the surfactant would be almost exclusively in the water 
phase. Extended Data Table 1 summarizes the HLB values. 


The emulsion cooling was realized in rectangular glass capillaries with 
length of 50 mm, width of 1 mm and height of 0.1 mm, enclosed within a 
custom-made metal cooling chamber, with optical windows for microscope obser- 
vation (Extended Data Fig. 4). The chamber temperature was controlled by cryo- 
thermostat (Julabo CF30) and measured close to the emulsion location, using a 
calibrated thermo-couple probe with an accuracy of +0.2°C. 

The optical observations were performed with Axioplan and AxioImager 
M2.m microscopes (Zeiss) in transmitted, cross-polarized white light, 
with included compensator plate situated after the sample and before the 
analyser, at 45° with respect to both the analyser and the polarizer. Long-focus 
objectives x20 and x50 were used. The drop diameter was determined from 
microscope images. 

The average ‘height’ of the deformed drops was calculated by dividing the 
total volume of the drop (calculated from the radius of the initial spherical 
drops) by the projected area of the non-spherical drop shapes, measured from 
the microscope images. In this way the aspect ratios, shown in Figs 1 and 2, 
were determined. 
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Extended Data Figure 1 | Solid particles with various shapes. stabilized by 1.5 wt% of different surfactants: a, non-ionic Tween 60; 
a-h, The shapes were obtained by freezing of deformed hexadecane (a-d), _b, c, non-ionic Brij 78; d, cationic CTAB; e, f, non-ionic Tween 40; g, h, 


heptadecane (e), tetradecane (f), or eicosane (g, h) drops in emulsions, Brij 78. Scale bars, 201m. 
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Extended Data Figure 2 | Snapshot images of multiple hexadecane 
particles, obtained in 1.5 wt% solutions of various surfactants. 

a-h, Tween 60 (a-e), Brij 58 (f) and Tween 40 (g, h). a-d, Consecutive 
images from the evolution of emulsion droplets stabilized by Tween 60. 


d,,, = 15 pm 


4 : ts 


e, Rod-like particles before freezing. f, Frozen triangular particles. 

g, Frozen tetragonal platelets. h, Frozen toroidal particles. The initial 
drop sizes of the particles are indicated on the pictures. Cooling rates are 
0.5K min“, except for h, 2K min}. 
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t=0s t=6s t=10s t=15s 
Extended Data Figure 3 | Images proving that the deformed drops are still fluid. Extending drops collide with each other and bend, as shown with the 


black arrow. Images of hexadecane drops in 1.5 wt% aqueous solution of Brij 58, cooled with rate of 1 Kmin™!. 
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Extended Data Figure 4 | Experimental setup. a, Schematic presentation of the cooling chamber with optical windows, used for microscope observation 


of the emulsion samples. b, The studied emulsions are contained in glass capillaries, placed in the thermostatic chamber and observed through the 
optical windows. 
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Extended Data Table 1 | Hydrophilic-lipophilic balance values of the used non-ionic surfactants 


Surfactant HLB value 
Brij 58 | ie 
Brij 78 15.3 

Tween 40 15.5 

Tween 60 14.9 


HLB, hydrophilic-lipophilic balance. 
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Spatial and temporal distribution of mass loss from 
the Greenland Ice Sheet since aD 1900 
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Anders Schomacker!, Camilla S. Andresen’, Eske Willerslev! & Kurt H. Kjeer! 


The response of the Greenland Ice Sheet (GIS) to changes in 
temperature during the twentieth century remains contentious’, 
largely owing to difficulties in estimating the spatial and temporal 
distribution of ice mass changes before 1992, when Greenland-wide 
observations first became available”. The only previous estimates 
of change during the twentieth century are based on empirical 
modelling** and energy balance modelling®’. Consequently, no 
observation-based estimates of the contribution from the GIS 
to the global-mean sea level budget before 1990 are included in 
the Fifth Assessment Report of the Intergovernmental Panel on 
Climate Change’. Here we calculate spatial ice mass loss around 
the entire GIS from 1900 to the present using aerial imagery 
from the 1980s. This allows accurate high-resolution mapping of 
geomorphic features related to the maximum extent of the GIS 
during the Little Ice Age’ at the end of the nineteenth century. We 
estimate the total ice mass loss and its spatial distribution for three 
periods: 1900-1983 (75.1 + 29.4 gigatonnes per year), 1983-2003 
(73.8 + 40.5 gigatonnes per year), and 2003-2010 (186.4 + 18.9 
gigatonnes per year). Furthermore, using two surface mass balance 
models!®"! we partition the mass balance into a term for surface 
mass balance (that is, total precipitation minus total sublimation 
minus runoff) and a dynamic term. We find that many areas 
currently undergoing change are identical to those that experienced 
considerable thinning throughout the twentieth century. We also 
reveal that the surface mass balance term shows a considerable 
decrease since 2003, whereas the dynamic term is constant over 
the past 110 years. Overall, our observation-based findings show 
that during the twentieth century the GIS contributed at least 
25.0 + 9.4 millimetres of global-mean sea level rise. Our result will 
help to close the twentieth-century sea level budget, which remains 
crucial for evaluating the reliability of models used to predict global 
sea level rise’*. 

We use aerial stereo photogrammetric imagery recorded during the 
period 1978-1987 to map trimlines and lateral and end moraines asso- 
ciated with the maximum extent of the GIS during the Little Ice Age 
(LIA max), thereby quantifying vertical changes in ice surface elevation 
between the LIAax and 1978-87 (Fig. 1, Methods). To obtain a rate of 
ice mass loss, the year 1900 AD is assigned as a Greenland-wide time 
stamp of when the glaciers started to retreat from their LIA yx position 
(although we note that this varies regionally and locally®!*'), and 
1983 is assigned as the mean year of the aerial observations. Elevation 
differences after 1983 are derived from airborne and satellite altimetry, 
combined with a digital elevation model (DEM) developed from the 
aerial imagery (Methods). We use this geodetic approach to calculate 
spatially distributed ice thinning patterns and mass balance of the GIS 


for three periods (Fig. 2a—c); LIAmax(1900) to 1983, 1983 to 2003, and 
2003 to 2010. We omitted some areas of the GIS because of the lack of 
LIA data points (Methods). 


Figure 1 | Three-dimensional models of Kangerlussuaq Glacier. 

a, Reconstruction of the LIA, ice surface at 1900. b, The 2013 ice surface. 
c, Close-up of the northern rim of the 2013 ice surface. The base map is 
Landsat 8 satellite imagery from 2013. The LIA marks a cold period during 
which the GIS expanded, often associated with the time interval from 
1450-1850”. A spectacular indication that the GIS has been shrinking 
over the last century are the fresh trimlines, that is, the pronounced 
boundaries between abraded and less abraded bedrock on valley sides and 
fresh non-vegetated moraines close to the present glacier fronts in many 
areas of Greenland. Both features are considered to mark the culmination 
of LIA-glacial advances and to have been mainly formed during the 1700s 
or at the end of the 1800s*”. 
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Figure 2 | Surface elevation change rates in Greenland since the LIA 
maximum. The colour scale applies to all panels. a-c, Estimates of surface 
elevation change rates during LIAmax(1900)-1983 (a), 1983-2003 (b) and 
2003-2010 (c). The numbers listed below each panel are the integrated 
Greenland-wide mass balance estimates expressed as gigatonnes per year 
and as millimetre per year GMSL equivalents. The associated uncertainties 
include an uncertainty related to the scaling approach, an error related 

to observed changes during 2003-2010, and an uncertainty related to the 
scaling of the point-based observations. d-f, Total estimates of surface 
elevation change rates due to SMB fluctuations, using revised SMB 
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estimates from ref. 10 during LIAmax(1900)-1983 (d), 1983-2003 (e), and 
2003-2010 (f). g-i, The dynamically driven residual in elevation change 
rates during LIAmax(1900)-1983 (g), 1983-2003 (h), and 2003-2010 

(i). Negative values indicate mass loss. Uncertainties are reported as lo 

. Labels in a refer to Jakobshavn Isbre (JI), Kangiata Nunata Sermia 
(KNS), Frederikshab Isblink (FIB), Qassimiut Lobe (QL), Kangerlussuaq 
Glacier (KG), Helheim Glacier (HG), Zachariae Isstrom (ZI), and 
Nioghalvfjerdsfjorden Glacier (NG), respectively. Labels in c refer to north 
(N), northeast (NE), central east (CE), central west (CW), northwest 
(NW), southwest (SW) and southeast (SE), respectively. 
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Table 1 | Mass balance and components LlAmax(1900)-2010 


GIS SW CW NW N NE CE SE 

LlAmax(1900)-1983 Mass balance 75.1429.4 8.7444 7.9+4.1 27.6+6.2 2.943.7 2.8441 0.1423 30.6+4.5 
Revised SMB 375.64 84.5 39.9+9.0 55.3412.4 65.64+14.8 20.2+4.5 8.5419 21.2448 164.94+37.1 
estimates!° 
Dynamic 450.7+89.5 48.6+10.0 63.2+13.1 93.2+16.0 23.145.9 5.744.5 21.4+45.3 195.54+37.4 
residua 

1983-2003 ass balance 73.8+40.5 3.0429 5.94+3.4 23.4464 4.7+6.8 0.7+89 0.6+6.5 —38.0+5.5 
Revised SMB 427.6496.2 50.1+11.3 63.3414.2 69.2+15.6 24.0+5.4 9.3421 25.7458 186.0+41.9 
estimates!° 
Dynamic 501.4+104.4 53.24+11.6 69.2+14.6 92.6+16.8 28.7 +8.7 8.6+49.2 25.2+8.7 224.0+42.2 
residua 

2003-2010 ass balance 186.4418.9 29.7+4.6 28.6+3.0 4744+2.1 15.6+1.3 7.242.0 7442.2 50.5+3.6 
Revised SMB 278.7 +62.7 6.9+1.6 48.2+10.9 50.9+11.5 6.0+1.4 5.141.2 18.0+4.0 143.5432.3 
estimates!° 
Dynamic 465.24 65.5 36.6+44.9 76.8+11.3 98.4+£11.7 21.7419 12.3424 254+44.6 194.04+32.5 
residua 


Estimates of mass balance derived using the geodetic approach, the revised SMB estimates from ref. 10, and the dynamic residual of the GIS and the individual regions. Units, Gt yr-!. 


Figure 2a-c illustrates the annual mass balance for the three periods. 
We calculate a net mass loss of 6,233 + 2,436 Gt (75.1 429.4 Gt yr?) 
between the onset of glacial retreat from the LIA;,ax position (which 
we take to be 1900, as defined above) and 1983 (Fig. 2a). In northwest 
Greenland, where the majority of the ice sheet discharges through 
marine outlet glaciers, we find substantial and widely distributed thin- 
ning, leading to a mass loss of 27.6 + 6.2 Gt yr_', corresponding to 37% 
of the total mass loss (Table 1). In west and southwest Greenland, we 
find peripheral thinning concentrated near the two large marine outlet 
glaciers Jakobshavn Isbrz and Kangiata Nunata Sermia. Substantial 
changes also occurred at the land-based glaciers Frederikshab Isblink 
and Qassimiut Lobe, the latter being intersected by relatively small 
fjords draining its eastern part. Along the southeast coast, a region 
dominated by large marine outlet glaciers, thinning was extensive, 
in some areas propagating almost to the ice divide, causing a mass 
loss of 30.6 + 4.5 Gt yr~' (41% of the total). Here two of the largest 
outlet glaciers in Greenland’, Kangerlussuaq Glacier and Helheim 
Glacier, show distinctly different patterns, with Kangerlussuaq Glacier 
being the single largest point source of mass loss (10.6 + 1.2 Gt yr~'), 
accounting for 14% of the total ice sheet mass loss during this period, 
while Helheim Glacier appears to have been near balance (mass gain 
equivalent to mass loss), despite the fact that front positions reveal a 
considerable inter-period variability?4 of about 9 km. In east, north- 
east, and north Greenland thinning is less extensive and in some areas 
the ice margin remains at or very close to its LIAmax position, which 
in northern Greenland may be attributed to the confining effect of 
semi-permanent fjord ice on ice discharge’. The inference of persis- 
tent mass loss of the GIS since LIAmax may challenge the assumption 
of a near-balance ice sheet during the 1961-1990 period that is gen- 
erally invoked to partition recent mass loss (that is, determine mass 
loss either by surface processes or ice discharge), and thus a failure 
to acknowledge mass loss during the reference period can result in 
overestimating the recent ice mass lost owing to surface mass balance 
(SMB) and ice dynamic processes!®. 

We calculate a total mass loss of 1,475 + 809 Gt (73.8 + 40.5 Gt yr!) 
for the period 1983-2003 (Fig. 2b). In general, peripheral ice thinning 
was less widespread and many of the largest outlet glaciers showed a 
decreasing mass loss (Table 1). During this period, 83% of the total 
mass loss occurred in the northwest and southeast while Jakobshavn 
Isbrz alone accounted for 6%, indicating that loss in the remainder of 
the ice sheet was limited. Interestingly, a comparison of our estimate 
with studies that have higher temporal resolution suggests that most of 
the overall, ice-sheet-wide mass loss that we record during 1983-2003 
occurred in the late 1990s and early 2000s’’ following a more stable 
period in the 1980s°. 

Between 2003 and 2010, we estimate a mass loss of 1,305 +132 Gt 
(186.4 + 18.9 Gtyr!), based on the ice mask we employed (Fig. 2c); 
when we used the same ice mask as ref. 18 (Methods, Extended 
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Data Fig. 3) we obtain a mass loss of 250.1 + 21.2 Gtyr~', which is 
comparable to other studies”!”. We find that 2003-2010 mass loss 
not only more than doubled relative to the 1983-2003 period, but 
also relative to the net mass loss rate throughout the twentieth 
century. This latter observation corroborates other studies which have 
inferred accelerated mass loss in the early twenty-first century rela- 
tive to the late twentieth century*>’®. Many areas currently under- 
going changes are identical to those which underwent considerable 
thinning throughout the twentieth century, with the exception of 
Helheim Glacier and the Nioghalvfjerdsfjorden Glacier (Fig. 2a—c). 
Consequently, comparing the twentieth-century thinning pattern 
to that of the last decade, and assuming a similar warming pattern, 
we suggest that the overall present mass loss pattern will persist for 
mass loss in the near future, at least until major marine outlet glaciers 
become land-terminating; though this may be biased because recent 
observations from northeast Greenland suggest a considerable accel- 
eration in mass loss from Nioghalvfjerdsfjorden Glacier, following at 
least 20 years of dormancy, and from the Zachariae Isstrom glacier!®. 

To assess the SMB and ice dynamic components of the twentieth- 
century mass balance we use updated SMB estimates from ref. 10 
(Fig. 2d-f), which have been refined by implementing a more phys- 
ically based meltwater retention scheme, and calibrating for better 
agreement with RACMO2.1/GR"! during the period 1960-2012 
(Methods). The ice dynamic residual is calculated by subtracting sur- 
face lowering caused by SMB processes from the reconstructed total 
mass balance (Fig. 2g-i) and is largely similar to the SMB pattern, 
though with positive values in the ablation zone and negative val- 
ues in the accumulation zone. This general pattern is suggestive of an 
ice sheet close to balance; however, the residual also includes eleva- 
tion trends due to forcing that is not included in the SMB model we 
employ. Perhaps unsurprisingly, we find a large dynamic contribution 
to the mass balance in the southeast and northwest, both dominated 
by marine-terminating glaciers, whereas in other regions the land- 
terminating ice sheet margin exhibits a positive dynamic mass con- 
tribution to compensate for the lowering of the ice surface due to 
SMB processes. Our results suggest that variability of the dynamic 
term of the GIS mass balance during the three intervals, which are 
LIA max(1900)-1983, 1983-2003 and 2003-2010, is less than its asso- 
ciated uncertainties (Fig. 3a). Previous results have attributed the 
mass loss in 2000-2008 equally to decreasing SMB and to increasing 
discharge”, while estimates for more recent periods suggest that 
decreasing SMB is becoming the dominant driver for increasing mass 
loss'*?!. Here we find that although short-term dynamic variability 
may affect the mass balance!®?!-?3, on a centennial timescale the dom- 
inant driver for changes in the GIS mass balance so far appears to be 
variability in SMB (Fig. 3a). 

The temporal variability of the mass balance during the twenti- 
eth century is computed as the difference between the updated SMB 
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Figure 3 | Mass balance and implication of GMSL. a, Revised estimates 
of SMB from ref. 10 (orange bars), the ice dynamic residual (DR, yellow 
bars), mass balance based on the geodetic method (MB, dark brown 
bars), and mass balance based on the temporal mass balance approach 
(grey bars) covering the three periods LIAmax(1900)-1983, 1983-2003 
and 2003-2010. Black lines represent the associated 1a uncertainty 
ranges. The results suggest that variability in SMB affects long-term mass 
loss more strongly than does dynamic variability, which on a centennial 
timescale is more constant. b, The orange trace shows the 5-year running 
mean of the revised SMB estimates from ref. 10, the blue line represents 
the ice discharge modelled as a function of runoff using a 6-year trailing 
mean, and the dotted grey and solid grey lines show the yearly and 5-year 
running mean mass balance, respectively. The shaded areas reflect the 
associated 1o uncertainty range (Methods). c, Cumulative mass change 
since LIA»ax(1900) from the geodetic approach (brown line) and from 
the temporal mass balance reconstruction (grey line), and the shading 
gives the 1o uncertainty ranges. d, The bars show the contribution of 
mass loss of the GIS relative to different solutions of the twentieth century 
GMSL rise from ref. 26 (H15, light green), ref. 27 (J14, dark green), and 
ref. 28 (CW11, green). Our result shows the minimum relative input 
of the GIS to sea level rise, which ranges between 10% and 18% during 
LIAmax(1900)-2010, supporting a substantial contribution from Greenland 
during the twentieth century. 
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estimates of ref. 10 and modelled ice discharge derived as a function 
of runoff*4, using a 6-year trailing mean, and ice discharge data from 
ref. 21 (Methods). During the period LIAjax(1900)-2010 we find a 
mass loss of 10,071 + 8,580 Gt, which, despite the use of a smaller ice 
mask, is slightly higher than that of ref. 5. Although this ancillary 
temporal mass balance method is particularly sensitive to the ice 
discharge proxy employed, we find good absolute agreement with 
the mass loss of 9,013 + 3,378 Gt found using the geodetic method 
presented above; this adds constraints and confidence to the results 
presented here. 

Our temporal mass balance method suggests considerable varia- 
bility in the mass balance during the twentieth century (Fig. 3b). The 
greatest negative mass balance rates occurred during the late 1920s and 
early 1930s, a period during which the rate of air temperature increase 
was higher than during the past decade!*”®, and which also coincides 
with extensive glacier retreat in southeast Greenland". Following sub- 
stantially lower or even nearly zero negative mass balance rates during 
the 1940s, our model results suggests mass loss rates during the 1950s 
and 1960s that are similar to those observed during the late 1990s and 
the early twenty-first century”. In the period covering the 1960s to 
the 1980s our results are comparable to other modelling results that 
generally suggest net mass loss during the 1960s and an ice sheet near 
balance during the 1970s to 1980s (ref. 3). 

In the Fifth Assessment Report of the Intergovernmental Panel 
on Climate Change’, the twentieth-century global-mean sea level 
(GMSL) budget was assessed by comparing estimates derived from 
tide- gauges against observations of the different contributors, leading 
to unassigned residual sea level rise during 1901-1990. However, in 
ref. 8, no observational records of the contribution from GIS or the 
Antarctic Ice Sheet before 1993 are included. The failure to close the 
GMSL budget for the period 1901-1990 has been attributed to under- 
estimation of the individual contributor factors, including the polar 
ice sheets’*. A recent study recalculated the twentieth-century GMSL 
using a probabilistic technique only to find a considerably lower rate 
of twentieth-century GMSL rise before 1993, thus closing the budget 
without including contributions from the polar ice sheets”*. However, 
our results show that during the twentieth century the GIS contributed 
substantially to GMSL rise (Fig. 3c). 

In particular, the geodetic approach that is based on observations 
from aerial imagery, which indicates considerable thinning along the 
margin of the ice sheet, is regarded as a conservative minimum estimate 
of mass loss (Methods). We find using the geodetic approach a total 
mass loss of 9,013 + 3,378 Gt from LIAmax(1900) to 2010, equivalent 
to 25.0 + 9.4mm of GMSL rise, and a mass loss of 10,071 + 8,580 Gt 
(equivalent to 28.0 + 23.8mm GMSL rise) using our temporal mass 
balance method, and thus our results suggest that the GIS has contrib- 
uted significantly to the twentieth-century sea level budget. Combining 
our geodetic-based results with recent GMSL reconstructions***8 
shows that in 1900-1983 the contribution from the GIS to GMSL rise 
ranged between 11% and 17%; in 1983-2003 it ranged between 10% 
and 16% and in 2003-2010 it ranged between 11% and 18% (Fig. 3d). 
Using the same ice mask as ref. 18 we find that during 2003-2010 the 
contribution to sea level rise ranged between 15% and 24%. 

Thus far, any attempt to reconstruct long-term surface elevations 
beyond the scope of individual outlet glaciers has been prevented by 
the lack of a suitable Greenland-wide elevation model that would allow 
accurate observations of moraine and trimline heights representing the 
maximum ice sheet extent during the LIA. Our study provides 110 years 
of spatial and temporal mass balance of the GIS and in addition cen- 
tennial estimates of the SMB and dynamic terms of the mass balance. 
Finally, our conservative, observation-based results, showing consid- 
erable mass loss during the twentieth century from the GIS, minimize 
the unassigned residual GMSL rise during 1901-1990. This will help 
to close the twentieth-century GMSL budget, which is crucial for eval- 
uating the reliability of modelling contributions to past sea level rise, 
and hence for increasing confidence in projections of sea level rise’*. 
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METHODS 


Elevation changes between LIAmax and 1978-87 are derived from direct obser- 
vations of LIA ax moraines and trimlines and the ice surface in vertical stereo 
photogrammetric imagery recorded during 1978-87. Changes are extrapolated 
to the ice sheet interior using a scale-value approach based on aerial and satellite 
altimeter data from the period 2003-2010, with site-specific interpolations in 
82 basins around the ice sheet. The same scaling approach is used to derive 
changes from 1978-87 to 2003. For the entire LIAmax-2010 period we use annual 
estimates from a SMB model!” to quantify mass balance processes and to assess 
temporal variability of the mass balance components through time. In-depth 
descriptions of the methods used are provided below. 

Geometric approach to derive surface elevation changes. Previously, mass 
balance estimates of the entire GIS have been based on modelling efforts that 
rely on empirical relations between SMB and ice discharge*° or energy balance 
modelling®’. Geometric approaches have been applied to Jakobshavn Isbrz'?, 
outlet glaciers in Patagonia*!, and land-terminating glaciers on James Ross Island, 
Antarctic*’, by mapping trimlines and lateral and end moraines. These studies, 
however, focus on single-outlet glaciers from the GIS, smaller ice caps, or small 
isolated glaciers, respectively. Each study used varying methods to account for 
elevation changes at higher elevations inland, for example, finding upper bound- 
ary changes by vertical shifting of the contemporary equilibrium line altitude 
based on lapse rate temperature reconstructions*/, or by the vertical difference 
between trimlines and ice surfaces to provide elevation offsets and thereby esti- 
mate mass loss*. 

Here, we outline the geometric method we deployed that allows us to trans- 
late point observations of former ice margin position to an ice-sheet-wide mass 
balance. The height of an equilibrium glacier or ice sheet profile can be expressed 
as (see, for example, ref. 33): 


x 


h(x) =H 7 [= 


eee rf 


where H is the surface elevation at the ice divide, L is the length of the ice sheet 
profile, h is the surface elevation at distance x from the ice divide, and the exponent 
nis a constant. This relation assumes no sliding, a flat bed, uniform accumulation, 
and constant flowband width. Here, we apply it to show that by using elevation 
changes between time t1 and time t2 it is possible to estimate elevation changes 
during another period, for example, time t1 and time t3, or to extend the estimate 
further, for example between time t3 and time t4, by scaling elevation changes of 
the known period. Subsequently, the approach is assessed using observations at 
three main outlet glaciers in Greenland: Kangerlugssuaq Glacier, Helheim Glacier, 
and Jakobshavn Isbre. 

However, first we consider three ice surface profiles h, from the ice divide to 
the ice margin, using typical values for the GIS. Each surface elevation profile 
represents one of the time steps t1, t2 and t3. The glacier length from the ice- 
divide L is in this example x = 200,000 m at t1, and changes by 1,000 m for each time 
step, while the ice divide height H is kept constant at 3,300 m, and the exponent 
is set to 3 (Extended Data Fig. 1a). We simulate surface changes by changing the 
glacier length L (this corresponds to advance or retreat of the ice margin), and thus 
surface elevation changes at x are governed by the total length of the ice profile. 
In our example, the ice retreats from the initial time step tl by 1,000 m and the ice 
surface lowers at t2. Next, we predict the ice profile at t3 by applying a scale-value 
and define the predicted profile 3 (Mpre_t3) (Extended Data Fig. 1b) as: 


hpre_13 = hn + S(hAi2 — hn) (2) 


where S is a constant. 

Comparing the elevation changes between hy and M3 (dhy3) and those between 
hy and Mpre 13 (dhy3_pre), derived using equation (2) and an S-value of 2.2, shows 
overall agreement, though also differences near the margin (Extended Data 
Fig. 1c). However, here the surface profile /,; is part of both the input and of the 
output. To generate a predicted difference where the same timestamp (for example, 
t1) is not incorporated in the input and the output, the S can be altered and an 
‘independent dh estimate may be calculated. Here, My2—hy (dhy12) and an S-value 
of 1.2 simulates dhj3t4_pre which shows overall agreement with hj3—hy4 (dhy3t4), but 
again also differences near the margin (Extended Data Fig. 1d). Nevertheless, it 
implies that if dhyy2 and dhy3¢4 are both known the constant S can be derived as the 
ratio between these values. 

Extended Data Fig. le shows the difference between profile his and Mpre_ts 
using a constant S. Over large parts of the profile the difference is small (<1 m), 
however, near the margin differences increase to tens of metres. We use the differ- 
ence as an expression of the constant S and denote it Osmeth (and include it in our 


LETTER 


mass balance uncertainty calculations). Extended Data Fig. 1f shows change in 
elevation (in metres) between two timestamps as a function of surface elevation. 
Thinning is largest at lower elevations, but drops rapidly and become close to 0 
at h > 2,500m. 

We note that, considering the differences near the margin (Extended Data 
Fig. le), the profile approach employed does not work (well) near the terminus of 
marine-terminating outlet glaciers. As discussed in Methods section ‘Uncertainties 
and conservative mass balance estimates, however, we use an ice mask derived from 
aerial images recorded during 1978-87, and thus the large differences between 
simulated and predicted surface profiles, that is, 7smeth, (Extended Data Fig. le) and 
large elevation changes (Extended Data Fig. 1f) at low elevations are not included 
in the estimate of the period between LIA;ax(1900) to 1978-87. 

The approach presented here is founded in the relation in equation (1), for 

which certain assumptions are made. These assumptions are violated for a large 
part of the ice sheet. For instance, basal sliding is considerable near marine- 
terminating outlet glaciers, which combined drain 88% of the ice sheet, and over 
the majority of the entire ice sheet basal-sliding motion dominates over internal 
deformation™. Extended Data Fig. 2 provides three examples where we apply our 
approach to major marine-terminating outlet glaciers. Here, we compare eleva- 
tion changes derived using our scaling approach with elevation changes derived 
from the DEM (see Methods section ‘Photogrammetric DEM 1978-87’) and 2003 
NASA Airborne Topographic Mapper (ATM) flight lines*®. We find good agree- 
ment (within uncertainties) between the observed and predicted elevation change 
rates. The examples in Extended Data Fig. 2 illustrate the validity of our approach 
in fast-flowing areas, where basal sliding is considerable, the bed is not flat, the 
accumulation is non-uniform, and the width of the flowband is not constant, that 
is, where the assumptions of the relation in equation (1) are violated. Moreover, 
the combined uncertainty that we estimate includes an uncertainty related to the 
scaling approach (smeth), an error related to changes during 2003-2010 (dhgotia) 
(see Methods section ‘2003-2010 elevation changes from air- and space-borne laser 
altimetry’), and an uncertainty related to the scaling of point-based observations, 
for example, dhyj, (see Methods section “LIAmax to 1978-87 mass balance’), and 
thereby the combined uncertainty estimate accounts for the scaling of the obser- 
vations, and thereby incorporates the variability between observations and dhgoiia- 
Thus, we regard the comparison illustrated in Extended Data Fig. 2 as a validation 
of our approach to derive ice-sheet-wide mass balance estimates. 
2003-2010 elevation changes from air- and space-borne laser altimetry. To 
detect ice surface elevation changes from April 2003 to April 2010, which serves 
as the base data set from which to calculate a scale value, we use all available Ice, 
Cloud, and land Elevation Satellite (ICESat) GLA12 Release 31 data**. ICESat 
elevations have a crossover standard deviation of ojcEsat = 0.2 m (refs 37-39). 
Furthermore, we use all available NASA ATM flight lines** between 2003 and 
2010, and NASA's Land, Vegetation, and Ice Sensor (LVIS) flight lines from 
2010 (ref. 40), both of which have an uncertainty of 0.1m. Ice surface elevation 
changes and associated uncertainties during the period April 2003 to April 2010 
are derived in 1km x 1km cells and converted into an ice sheet surface elevation 
change grid (dt2993-2010) 12384143, 

Using SMB fields from RACMO2.1/GR output’! the elevation change due to 
firn compaction is calculated'*” and subtracted from the total elevation change 
(dh2903-2010), thereby yielding an elevation change due to solid ice changes (dhigotia) 
ona 1km x 1km grid. 

As part of our calculation, we divide the ice sheet into drainage basins (Extended 

Data Fig. 3). Here, we use the drainage basins from ref. 44 divided into sub-basins 
and we include additional areas around the ice sheet margin, yielding a total of 
82 basins. Some areas on the southeast coast were omitted due to the lack of LIA 
input data, mainly caused by extensive snow cover at the time of acquisition of the 
aerial stereophotographs. Additionally, we use the Randolph Glacier Inventory, 
version 3.2 (ref. 45) to exclude glaciers not connected to the ice sheet and those 
only weakly connected, RGIFlag CLO and CLI, respectively. 
Measuring LIA elevations from aerial photographs. To detect ice surface 
elevation changes from LIAmax to 1978-87 we use aero-triangulated vertical 
stereo photogrammetric imagery recorded during 1978-1987. The images were 
recorded between late July and mid-August from an altitude of 13,500 m to a scale 
of 1:150,000. They are part of a larger collection of images covering the entire 
ice-free part of Greenland, processed at the Anthropocene and Quaternary 
Research Group of the Centre for GeoGenetics, Natural History Museum 
of Denmark. 

The aerial photographs were processed in the SOCET SET 5.6. software pack- 
age written by BAE Systems using GR96 aero-analytical triangulated control 
points surveyed with GPS and provided by The Danish Geodata Agency”®, a part 
of the Danish Ministry of Energy, Utilities and Climate. The processed aerial 
photographs allow us to survey trimlines, ice margins and moraines outlining 
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the LIAmax in three dimensions with high accuracy. In the survey two types of 
points have been defined and measured: type], trimline or lateral moraines; and 
type 2, an active front. 

Each of these types contains two surveyed data points: LIAmax and the 1978-87 
position and elevation of the ice margin (Extended Data Fig. 4). For type 1 the 
LIA» ax extent is determined from the trimline between the non-eroded and the 
freshly ice scoured bedrock or lateral moraines’. Type 2 determines the position 
of end moraines or other geomorphic evidence of recent glacially overridden 
landscape. 

The data type distribution is illustrated in Extended Data Fig. 5a while Extended 
Data Fig. 5b illustrates the elevation difference at 3,003 points between LIAmax and 
1978-87 derived from 6,006 manual point measurements. 

The elevation differences derived from the three-dimensional stereo- 

photogrammetric single-point survey (dhy, values) are assigned an uncertainty 
of 1m as almost all systematic error affecting the triangulation of the images is 
eliminated’. Moreover, since the LIA max extent is mapped on the 1978-87 images 
we can ignore post-depositional effects on the moraines and glacial isostatic 
adjustment correction’. 
LIA pax to 1978-87 mass balance. The ice mass balance since the LIA max is 
calculated by scaling dh, values to the elevation changes between 2003 and 
2010 (dhgotia). We use the dhy 1, points at outlet glaciers of variable sizes (land- 
and marine-terminating) as well as other areas of the ice margin to determine 
the scale value (S,1,), derived as the ratio between the point-based dhyj, and 
dhgoiia of the closest grid cell. This implies that the shape of the ice profiles 
for different timestamps is not (directly) used; rather, we use dh point values 
that show the point-based thinning pattern along the periphery to derived the 
ice-sheet-wide thinning pattern. Subsequently, the S,j, values found for each 
glacier are interpolated using the weighted mean to a regular 1 km x 1km grid 
for each of the 82 calculation basins (Extended Data Fig. 3). For each grid 
point we predict an S value and assign an uncertainty, 7s(114_rms)» based on 
the root mean square of the predicted values within the basin. However, the 
total uncertainty osy1a) of S values has to account for the os(meth) (see Methods 
section ‘Geometric approach to derive surface elevation changes’). Thus for 
each grid point i we obtain: 


Osta) = | (51am)? 7 (Semetn))” (3) 


Next, the elevation change between LIAmax and 1978-87 are calculated by mul- 
tiplying the S,1, grid and the elevation change due to solid ice changes (dhgotia) 
between 2003 and 2010: 


dha = Strydhotia (4) 


where i represents each cell on a regular 1 km x 1km grid. By using dhgotia, which 
includes changes in elevation due to firn compaction, we thereby obtain estimates 
of the mass balance. 

To each value of dhy1, we assign uncertainty as follows: 


i 2 i 2 
: ; OS(LIA 0 
Cthua= Aa, || ao +| 7 (5) 
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The calculation allows us to ignore the actual timing of the maximum extent dur- 
ing LIA because it is mapped on the 1978-87 images, thereby making it directly 
applicable to derive ice net mass balance between LIAmax and 1978-87. However, 
to obtain a rate we assign 1900 as a Greenland-wide time stamp of when the gla- 
ciers started to retreat following the LIA, although we note that there is regional 
and local variability!*"?, and use 1983 as the average year of the aerial imagery. 
Photogrammetric DEM 1978-87. We produced a 25m x 25 m digital eleva- 
tion model (DEM1978/87) using the vertical stereo photogrammetric imagery 
recorded during 1978-1987 following a standard approach'*”*. The DEM is 
processed into WGS8é4 ellipsoid heights, directly comparable to ICESat, ATM, 
and LVIS data. 

Our validation methodology is based upon co-registration methods that relate 
the three-dimensional co-registration vector between two elevation surfaces to 
terrain slope a and aspect (refs 47 and 48). The co-registration parameters 
are determined by robust least-squares minimizations of stable terrain elevation 
changes between the DEM tiles and ICESat*® (dh) using: 


dh =a cos(b — w) tan(a) +c (6) 


where a and b is the magnitude and direction, respectively, of the horizontal 
co-registration vector and c is the mean vertical bias between the two elevation 
data sources. 


We perform the co-registration on a 50km x 50km grid over all the DEM. All 
slopes less than 5° are removed and a curvature filter is applied to remove regions 
where resolution variation between the data sets may cause spurious elevation 
differences. 

The co-registration parameters are generally less than 15 m horizontally and 

less than 10m vertically. At the 1 confidence level, the aero-photogrammetric 
DEM has an accuracy of 10m horizontally and 6 m vertically while the precision 
is better than 4m (Extended Data Fig. 6). We note that the 6 m vertical accuracy of 
the DEM is different from the 1 m uncertainty related to dhy1, values obtained in 
the three-dimensional stereo-photogrammetric single-point survey (see Methods 
section “LIA;,x to 1978-87 mass balance’). 
1978-87 to 2003 mass balance. The mass balance between 1978-87 and 2003 
is determined using the same approach as outlined for calculating the LIAmax to 
1978-87 mass balance but with different input data. We use ATM data from 2010 
(ref. 35), supplemented with 2009 ICESat data** to fill in gaps, to determine the 
mass balance between 1978-87 and 2010. Subsequently, we subtract the derived 
mass balance between 2003 and 2010 to determine the 1978-87 to 2003 mass 
balance. 

The merged ATM and ICESat data cover outlet glaciers of variable size and 
termination regime. At these data points, elevations from the 1978-87 DEMs 
are extracted, although we remove interpolated DEM surfaces using a reliability 
mask’, an output produced during DEM production. The point-based difference 
between the ATM/ICESat measured surface elevation and the DEM elevation is 
dhgos-10. To accommodate issues related to large differences between simulated 
and predicted surface profiles (see Methods section ‘Geometric approach to derive 
surface elevation changes’) we use point observations only up-glacier from the 
terminus, though the distance varies for individual outlet glaciers with the location 
of available ATM and ICESat data. 

Next, we derive the Sgos_19 value as the ratio between the point-based dhgos_10 
and a dhojia Value extracted from the 1km x 1km grid using bilinear interpolation 
between grid cells. The Sgos-19 values are subsequently interpolated using a 
weighted mean to a regular 1 km x 1 km grid. Thus for each grid point we predict 
a S value and assign an uncertainty, 0(g0s-10_rms)» based on the root mean square 
of the predicted values within the basin. However, the total uncertainty Osg0,_ 19 of 
S values has to account for the os(meth) (see Methods section ‘Geometric approach 
to derive surface elevation changes’). Thus, for each grid point i we obtain: 


5808-10) = «(RO aie ah) 1 (os¢meth))” (7) 


The elevation change between 1978-87 and 2010 is calculated by multiplying the 
Ssos-1o grid and the dhgotia (2003-2010) grid: 


dhigg,_ 10 = Sioa rodhiotia (8) 


where i represents each cell on a regular 1 km x 1km grid. 
To each value of digo.-10 we assign an uncertainty of: 


‘ 2 P 2 
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i j 7s5(80s— 10) SF ah soli 
i — i solid 
7 aheos—10 — dhgo5 10, i i (9) 
\ | Ssos—10 Asotia 


Subsequently, we subtract dhi,),4 from dhgg,_ 9 to determine the mass balance from 
1978-87 to 2003. 

Uncertainties and conservative mass balance estimates. For the entire ice sheet 
(or individual basins) we calculate the uncertainty of the mass balance estimates 
during LIAmax to 1978-87 and 1978-87 to 2003 as: 


n 
oro1 = D4 Fdheor (10) 
i=1 
where a4, po) iS the uncertainty of each grid point during the period of interest 
(LIA max to 1978-87 or 1978-87 to 2003) derived from equation (5) or 
equation (9) and n is the number of points covering the basin, region, or entire 
ice sheet being considered. 

We regard the derived mass balance estimates between LIAmax and 1978-87 
as conservative for a number of reasons. First, when calculating mass balance 
we are limited by the spatial extent of our ice mask, which implies that mass loss 
between the boundary of the ice mask (based on the ice extent derived from the 
1978-87 aerial images) and the maximum extent of the glaciers during the LIA is 
not included. This zone of non-included ice loss is largest near marine-terminating 
glaciers. For example Jakobshavn Isbre retreated by about 20 km between LIAmax 
and 1978-87, while Kangerlussuaq Glacier and Midgaard Glacier retreated by 
about 12km and about 20km, respectively, during the same period. Here, the outer 
parts of the glaciers may have been afloat during the LIA and would already then 
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have contributed to GMSL rise, while only mass loss up-glacier from the LIAmax 
grounding line would contribute to post-LIA sea level rise. As the extent of the 
ice mask is not identical to the LIAmax grounding line we cannot capture the mass 
loss between these two. 

Second, owing to the lack of LIA data on the southeast some areas are excluded, 
and so are glaciers not connected to the ice sheet and those only weakly con- 
nected, RGIFlag CLO and CLI, respectively*’. This may lead to a smaller mass 
balance estimate for the different periods; for example we estimate a mass loss of 
186.4 + 18.9Gt yr“! for the period 2003-2010, while using the same ice mask as 
ref. 18 we arrive at a mass loss of 250.1 + 21.2Gt yr~’. 

Third, propagation of thinning at the ice margin towards the interior is not 
incorporated in the present model, as we use a scaling approach based on point- 
based thinning observations at the periphery of the ice sheet to estimate the mass 
balance. Model experiments suggest that mass loss at lower elevations would 
propagate inland and cause interior thinning on decadal timescales and con- 
tinue inland even if mass loss at the ice margin ceases”. For all three periods we 
calculate a mass gain in the interior of the ice sheet, which since 1993 has also 
been identified by others*’*’. Verifying mass gain in the interior for the period 
LIA max to 1978-87 (1983) is difficult because ice-core-derived estimates of ice 
surface elevation changes are associated with vertical uncertainties of about 70m 
(ref. 53). Excluding the interior mass gain during LIAjax-1983 yields a mass loss 
of 7,712 + 1323 Gt (92.9+ 15.9 Gt yr!) relative to the conservative estimate of 
6,233 + 2436 Gt (75.1429.4Gt yr~}). 

Even though the individual contribution of the abovementioned assumptions 

to mass loss may be considered minor, the combined effects may be considerable. 
However, given the limitations in the present model configuration and lack of 
observations to constrain the behaviour of the interior mass balance, we favour 
the conservative estimate presented in this paper and emphasize that it should be 
regarded as a minimum contribution from the GIS to GMSL rise during the period 
LIA max to 1978-87 (1983). 
SMB modelling. The near-surface air temperature T and the land-ice SMB (that 
is, total precipitation minus total sublimation minus runoff) reconstruction of 
ref. 10, spanning 1840-2012, is calibrated to RACMO2.1/GR output". The cali- 
bration is important because SMB fields from RACMO2.1/GR are used as input to 
convert total elevation during 2003-2010 (dh2093-2010) into elevation change due 
to solid-ice changes (dhgoiig) using a firn-compaction model (see Methods section 
“2003-2010 elevation changes from air- and space-borne laser altimetry’). Thus, 
because we use dhgojiq to calculate mass balance estimates during LIAmax-1983 and 
1983-2003, it is critical that the two SMB models are comparable when assessing 
the components of the twentieth-century mass balance. Furthermore, owing to 
sharply decreasing ice core data availability after 1999, from which snow accumu- 
lation is derived in the SMB model of ref. 10, the model incorporates precipita- 
tion fields from RACMO2.1/GR. The calibration of T and the SMB components 
excluding snow accumulation employs a 53-year overlap period (1960-2012), 
whereas snow accumulation is calibrated during the 1960-1999 period. The cali- 
bration employs linear regression coefficients at each 5-km grid cell that match the 
multi-year average of the reconstruction with that from RACMO2.1/GR. Prior to 
calibration the RACMO2.1/GR data are resampled/reprojected from their native 
0.1° (~11 km) grid to the 5-km grid employed by ref. 10. A 5%-8% correction is 
applied in SMB totals to account for the 5-km polar stereographic grid cell area 
variation with latitude. 

Refinements are applied to the original SMB reconstruction’ as follows. 
(1) Values are now estimated over all land, sea, and ice within the domain, rather 
than over only ice. (2) A physically based meltwater retention scheme” replaces 
the original simpler approach. (3) Multiple stations now contribute to the T value 
for each given month and grid cell within the domain, rather than employing the 
single highest-correlating station. (4) The RACMO2.1/GR data used for calibration 
have a higher native resolution (~11 km) than the Polar MM5 data (~24km) used 
to calibrate the original SMB reconstruction. (5) The SMB reconstruction now 
extends to 2012, rather than 2010. (6) The ice-core-derived annual accumulation 
rates are divided into monthly temporal resolution by weighting the monthly frac- 
tion of annual accumulation after the 1960-2012 average RACMO2.1/GR seasonal 
distribution at each grid cell. 

Absolute uncertainty for the revised SMB estimates from ref. 10 is estimated 
by comparing against field data. In situ annual ablation rates (n = 208), spanning 
1985-1992, yield an ablation root-mean-square error of 35%. This is analogous to 
an in situ comparison with RACMO2.1/GR. Comparison between revised SMB 
estimates from ref. 10 (or RACMO2.1/GR) with ice-core-derived net accumulation 
time series from 86 sites°° yields a 30% accumulation root-mean-square error. 

A fundamental assumption is that the calibration regression factors (slope and 
intercept), derived on a grid cell basis during 1960-2012 versus ice cores, mete- 
orological station temperatures, and with RACMO2.1/GR, are stationary in time. 
Testing this, we find that over the 53-year overlap period (1960-2012) cumulative 
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SMB anomalies drift between the reconstruction and RACMO2.1/GR by up to 
600 Gt as compared to a total mass flux of 24,000 Gt, suggesting a drift uncertainty 
of 2.5%. In the pre-1960 period, cumulative uncertainty may be larger. 
Temporal variability of the mass balance. To assess the variability of the mass 
balance during the twentieth century we use an approach similar to that of other 
studies***, Here iceberg discharge is estimated using a linear regression between 
reconstructed meltwater runoff from revised SMB estimates from ref. 10 and esti- 
mates of ice-sheet-wide iceberg discharge, spanning 2000-2012 (ref. 21). We find 
a peak correlation (r= 0.87; P< 0.01; degrees of freedom = 12) between annual 
iceberg discharge D and six-year mean meltwater runoff Re, calculated from the 
five years preceding, and including a given year. Dy is modelled ice discharge in 
gigatonnes per year and is calculated as follows: 


Dy = 0.766Rg + 266 (11) 


This is similar to employing a correlation between five-year lagging meltwater 
runoff and annual iceberg discharge”*. We use the discharge estimates of ref. 21, 
though we note that the estimates generally lie within uncertainties of other 
studies*”° except those of ref. 24 (Extended Data Fig. 7). Uncertainties related to 
the temporal mass balance method are calculated using Monte Carlo simulation 
(see Methods section ‘Estimating uncertainties using Monte Carlo simulatiom). 
Estimating uncertainties using Monte Carlo simulation. We implement a 
Monte Carlo uncertainty approach that accounts for the interaction of uncer- 
tainties in mass balance components®. The residual root-mean-square differences 
between revised SMB estimates from ref. 10 and RACMO2.1/GR are increased by 
50% to form a conservative uncertainty estimates given that the absolute uncer- 
tainty may be larger than the calibration root-mean-square difference. The post- 
calibration root-mean-square difference for runoff is increased by 50% yielding 
an assumed conservative uncertainty of 24.9%. That for accumulation is 8.0% 
and that for SMB is 22.5%. These are relative uncertainties between RACMO2.1/ 
GR and revised SMB estimates from ref. 10. Absolute uncertainty is evaluated 
relative to field data. 

Because iceberg discharge is a function of runoff, the runoff uncertainty is prop- 
agated through Eq. (11) to estimate iceberg discharge uncertainty. The temporal 
mass balance uncertainty is estimated as 78 Gt yr_!. Extended Data Fig. 8 shows 
the Monte Carlo simulation for the temporal mass balance expressed as cumulative 
eustatic sea level change during 1840-2012. 

Data. We use aero-triangulated vertical stereo photogrammetric imagery 
recorded during 1978-1987 to manually map the former ice extent during the 
LIAmax. Raw imagery was made available for research purposes by The Danish 
Geodata Agency, a part of the Danish Ministry of Energy, Utilities and Climate. 
The derived products used in this study such as orthophotos and the DEM are 
available in GeoTiff format upon request to the corresponding author. Moreover, 
we use all available ICESat GLA12 Release 31 data*® (https://nsidc.org/data/icesat/ 
data.html) and all available NASA ATM flight lines*® between 2003 and 2010 
(http://nsidc.org/data/blatm2 and https://nsidc.org/data/ilatm2) and NASA's 
LVIS flight lines“ from 2010 (https://nsidc.org/data/ilvis2). Information on SMB 
data from RACMO2.1/GR"! is available at http://www.projects.science.uu.nl/ 
iceclimate/models/greenland.php. while information on SMB is available from 
ref. 10. To model ice discharge we use ice discharge estimates from ref. 21. 

Code availability. Data analyses have been performed using the SOCET SET 5.6 
software package (written by BAE Systems), ArcGIS10.1 (written by Esri Inc.), and 
custom-built routines for Python, Matlab and Fortran. The codes are not available. 
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Extended Data Figure 1 | The dh calculation scheme. a, Three simulated 
ice surface profiles based on Glen's flow law, each representing time steps 

t1 (blue dots), t2 (red dots), and t3 (black dots). b, The same profiles as in a 
supplemented with the predicted profile hpre_t3 (grey line) derived using an S 
value of 2.2. The figure shows agreement between the profile his and Mpre «33 
hence, if we know the elevation change during one period (for example, tl 
and t2), then it is possible to obtain the elevation change during another 
period (for example, tl and t3) by multiplying with a constant S. c, The 
elevation changes between t1 and t2 (d/y2, blue line) and between tl and 
t3 (dhyi13, brown line). The black dots are the elevation changes between tl 
and the predicted surface profile Mpre t3 derived using the elevation change 
between tl and t2 and an S value of 2.2. The predicted difference 
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(dht1i3_pre) between t1 and t3 is derived from dh,;,2 and a constant, 
implying that the surface profile at tl is part of both the input and of the 
output. d, dh (blue line), the elevation changes between t3 and t4 
(dhizia, dark green line), and the predicted dhysta (dhisis_pre, black dots), 
which is derived using d/yi42 and an S value of 1.2; thus none of the 

ice surface profiles are part of both the input and output. If both dhyy 
and d/y3t4 are known then S can be derived as the ratio between the 
observations. e, The uncertainty between the profile hg and pre t3 using a 
constant S. Generally the differences are small, though they increase near 
the margin. f, The elevation change between two time steps as a function 
of elevation. Changes are largest at lower elevation and become close to 0 
at h > 2,500 m. 
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Extended Data Figure 2 | Validation of the scaling approach. 


Oo 


4 6 8 10 12 
Distance (km) 


0) 2 4 6 8 10 12 
Distance (km) 


the observed elevation changes during 1981-2003 and includes combined 


a, Elevation profiles of Kangerlussuaq Glacier in southeast Greenland from __ errors of the measured height derived from stereo photogrammetric DEM 


the 1981 DEM (grey line), 2003 ATM data (red line), and the predicted and 2003 ATM data. c, A 1981 orthophoto of Kangerlussuaq Glacier with 
surface profile (blue line) in 2003, derived using the scaling approach 2003 ATM data (red dots) and the May 2003 glacier front (black line). 
based on local scale values and the 2003-2010 elevation changes (dholia). d-f and g-i illustrate the same as a—c for Helheim Glacier and Jakobshavn 
(For a more complete description of the approach using observations Isbrze, respectively. However, for Jakobshavn Isbre the DEM and 

see Methods section “LIA,, to 1978-87 mass balance’). b, The elevation orthophoto is from 1985. Note the different scales for each of the glaciers. 
change rate between the observed 2003 surface profile (red) and the Comparing the elevation change rates derived from the scaling approach 
predicted 2003 surface profile (blue) relative to the 1981 DEM. The blue and those directly from the observations, we find good agreement as 
vertical lines denote uncertainty estimates that include an uncertainty the error bars overlap. Thus, we regard the illustrated comparison as 
related to the scaling approach, an error related to observed changes a validation of our method of deriving ice-sheet-wide mass balance 


during 2003-2010, and an uncertainty related to the scaling of point-based _ estimates. 
observations. The red vertical lines denote an uncertainty associated with 
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Extended Data Figure 3 | GIS calculation basin subdivision. Calculation _ extensive snow cover on the vertical images. The total ice mask covers 
basins modified from ref. 44 to include slower-moving areas of the ice 1,647,907 km. The additional areas included in the ice mask used by 
sheet. Note that three areas on the southeast coast have been omitted ref. 18 are shown in dark grey and in total the ice mask covers 

due to an insufficient number of LIA to 1978-87 data points caused by 1,739,564 km?. 
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Extended Data Figure 4 | Mapping elevation changes during LIA yx 

to 1978-87. a, Type 1 points are placed at the trimline or lateral moraine 
marking the LIA,;,ax position and at the 1978-87 ice surface perpendicular 
to the flow direction, and as we assume that the cross-section profile of 
the glacier is the same during the LIAmax and 1978-87 then the vertical 
difference dh is the thinning at this location. This approach is the same 

as used by ref. 9. b, For Type 2 points we assume that the longitudinal 
shape of the glacier is the same during the LIA;x as in 1978-87. Points 
are placed at the LIAmax margin and at the 1978-87 margin, and assuming 
a longitudinal profile that does not change over time, the distance dL is 
used to find the vertical difference between the 1978-87 point and a point 
on the glacier at a distance of dL following the same flowline. Points for 
glaciers receding on steep slopes have been discarded. 


© 2015 Macmillan Publishers Limited. All rights reserved 


a 


LETTER 


90° W 


+ a8 


80° W 


75°N 


70° N~ 


65° N 


60°N 


80° W 70°W 60° W 50° W 40° W 30° W 20° W 10° W 


age) | 


> 80°N 


r75°N 


70°N 


_~ Elevation mo N 


change (m) 
e@ 0--9 


-10 - -25 


-25--50 |) 


50--100 | 
-100 - -200 


-200 --300 _| g9°N 


-300 - -448 


et rh i ; us oe \ 


50° W 40° W 30° W 
Extended Data Figure 5 | Distribution and values of dhy;,4 points. 
a, Distribution of the two point types used to determine thinning 
between LIAmax and 1978-87. b, From the type 1 and type 2 points, net 
elevation change dhy;, is measured based on 3,003 point measurements 
from the LIAax to 1978-87. Of the 3,003 pairs—that is, 6,006 point 
measurements—2,476 are measured as type 1 and 527 are type 2. The 
majority of the type 2 points are found along the land-terminating and 
slower-moving parts of the ice sheet, whereas type 1 points are found in 
valleys through which the ice flows and on nunataks. dhyz1, values range 
between zero and —448 m (a negative value implies thinning). The largest 
dhyj, values are found along the major marine-outlet glaciers along the 
northwest and southeast coast and along the rim of the Qassimiut lobe 
(QL), while in contrast the lower dhy14 values are found along the slower- 
moving margins of the typically land-terminating ice sheet. In some 
areas around the ice sheet no trimlines are visible and/or the ice margin 
is in contact with the LIA moraines. Analysis of glacier front positions 
for outlet glaciers in the north, central west, southwest, and south using 
historical aerial photographs from the 1930s and onwards'*°”"* suggest 


50° W 40° W 30° W 
that a few outlet glaciers, primarily land-terminating, have been stable 
or advanced since the LIA. In the northwest, central west, and southwest 
snow cover on the 1978-87 vertical aerial images is generally limited, 
which eases the distinction between freshly eroded bedrock, newly 
deposited glacial sediment, and non-eroded vegetated terrain surfaces. 
This supports the notion that if no trimline is visible on the photographs, 
then the ice margin is at an advanced and stable stage. Hence, the dhy1, 
and dL 1, values for points are zero. An example of a glacier that has 
advanced during the twentieth century is the Saqqap Sermia (SS)°” in the 
Nuup Kangerlua (Godthabsfjord) complex in southwest Greenland. Here 
no trimlines are visible along the valley and the boundary between ice and 
vegetation cover is only interrupted by small meltwater channels, and at 
the glacier front no end moraines are visible on the meltwater plain. In the 
present setup we are not able to assign any post-LIA mass gain; however, 
as only a limited number of outlet glaciers have advanced and exceeded the 
LIA front position during the twentieth century we regard this mass gain 
as negligible relative to the ice-sheet-wide mass loss. 
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Extended Data Figure 6 | Horizontal and vertical displacements in 
aero-photogrammetric DEM. a-c, Histograms of the horizontal (a) 
and vertical (b) co-registration displacements for each 50km x 50km 
grid cell show that the aero-photogrammetric DEM compilation is 
generally accurate to within 10 m horizontally and 6 m vertically with a 
precision greater than 4m (1o confidence level) (c). d-f, The horizontal 


© Sample size < 200 points 


(d) and vertical (e) components of the co-registration vectors between 
50 km x 50km sections of the aero-photogrammetric DEM compilation 
and ICESat laser altimetry are plotted with the root-mean-square error 
of stable terrain differences after adjusting for the three-dimensional 
mis-registration (f). 


© 2015 Macmillan Publishers Limited. All rights reserved 


700 4 


650 + 


600 + 


500 + L b° 


450 7 


Ice discharge (Gt/yr) 


<7 


400 + 


350 T T T T T 


T 
1960 1970 1980 1990 2000 2010 
Year 


Extended Data Figure 7 | Estimates of ice-sheet-wide iceberg discharge. 


Ice discharge estimates and associated errors (vertical bars) from ref. 24 
(black), ref. 3 (blue), ref. 21 (red), and ref. 56 (grey). We note that the used 
discharge estimates of ref. 21 are 15 Gt yr! greater than those of ref. 56, 
30Gt yr! less than those of ref. 3, and 110 Gtyr_ less than those of 

ref. 24. Such discrepancies are attributed to differences in data availability 
and assumptions used for filling gaps or the method used to correct for 
SMB between the inland flux gates and the grounding lines’. 
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Extended Data Figure 8 | Temporal variability of the mass balance estimates from ref. 10 and modelled ice discharge, calculated as a function 
expressed as cumulative eustatic sea level rise. Reconstructed temporal of six-year average runoff. The uncertainty is assessed from a Monte Carlo 
mass balance during the period 1840-2012 derived using revised SMB simulation using 4,000 samples for each year. 
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Grassland biodiversity bounces back from 
long-term nitrogen addition 


J. Storkey!, A. J. Macdonald!, P. R. Poulton!, T. Scott!, I. H. Kohler*+, H. Schnyder?, K. W. T. Goulding! & M. J. Crawley? 


The negative effect of increasing atmospheric nitrogen (N) pollution 
on grassland biodiversity is now incontrovertible!°. However, the 
recent introduction of cleaner technologies in the UK has led to 
reductions in the emissions of nitrogen oxides, with concomitant 
decreases in N deposition’. The degree to which grassland biodiversity 
can be expected to ‘bounce back in response to these improvements 
in air quality is uncertain, with a suggestion that long-term chronic 
N addition may lead to an alternative low biodiversity state®. Here 
we present evidence from the 160-year-old Park Grass Experiment 
at Rothamsted Research, UK®, that shows a positive response of 
biodiversity to reducing N addition from either atmospheric pollution 
or fertilizers. The proportion of legumes, species richness and 
diversity increased across the experiment between 1991 and 2012 as 
both wet and dry N deposition declined. Plots that stopped receiving 
inorganic N fertilizer in 1989 recovered much of the diversity that 
had been lost, especially if limed. There was no evidence that chronic 
N addition has resulted in an alternative low biodiversity state on the 
Park Grass plots, except where there has been extreme acidification, 
although it is likely that the recovery of plant communities has been 
facilitated by the twice-yearly mowing and removal of biomass. This 
may also explain why a comparable response of plant communities 
to reduced N inputs has yet to be observed in the wider landscape. 
Total emissions of oxidized plus reduced N from intensive agriculture 
and the burning of fossil fuels increased markedly from the middle of the 
twentieth century in industrialized nations’. There is strong evidence 
from comparisons of similar habitats along N deposition gradients** 
that these increases have led to declining biodiversity in semi-natural 
ecosystems through acidification and eutrophication. These ‘space- 
for-time’ studies assume that air pollution has only increased, and that 
the deposition gradient is representative of a unidirectional temporal 
shift in grassland biodiversity. Since the late 1980s, however, measures to 
reduce atmospheric pollution have successfully reduced UK emissions 
of NO, by ~50% and of sulfur (S) by ~90% (ref. 9). Quantifying the 
potential recovery of biodiversity in response to reducing air pollution 
requires an alternative to the space-for-time approach, ideally moni- 
toring long-term community dynamics on permanent plots’. In this 
context, the Park Grass Experiment at Rothamsted, which started in 
1856, presents a unique opportunity to study shifts in biodiversity in 
response to environmental change both pre- and post-industrialization’. 
Park Grass consists of permanent plots with different fertilizer treat- 
ments that were established on a uniform pasture that was at least 100 
years old in 1856. In the early 1900s, most plots were divided in two, 
and lime was applied to one half—designated the limed (L) or unlimed 
(U) sub-plots. In 1965, the limed sub-plots were further split into 
sub-plots ‘a and ‘b, and the unlimed sub-plots were further divided 
into sub-plots ‘c and ‘d. Since this time, varying amounts of lime have 
periodically been added to maintain a target pH of 7, 6 and 5 for sub- 
plots a, b and c, respectively; sub-plot d is left unlimed (Extended Data 
Table 1). The liming treatments mean that the eutrophication effect of 
atmospheric N deposition on plant community dynamics can be quan- 
tified independently of soil pH (which also responds to changes in S 


deposition). Park Grass is in a semi-urban environment, close to a road 
and on the edge of the town of Harpenden, which act as local sources 
of atmospheric pollutants*!°. Local measurements of ammonium and 
nitrate deposited in rainfall show that they have both declined by a 
comparable amount since 1985, and reflect the current national down- 
ward trend in total N emissions (Fig. 1). Our measurements did not 
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Figure 1 | Changes in atmospheric N deposition, pH and proportion of 
legumes on the limed and unlimed sub-plots of the Park Grass nil plot. 
a, Changes in wet (x) and dry (+) N deposition; line indicates moving 
5-year average (the small increase in the early 2000s may be a legacy of a 
run of mild winters in the 1990s). b, Change in proportion of legumes 

(by dry weight) measured in the first herbage cut. Lines indicate the change 
in decadal average of percentage legumes on the limed plot (blue circles, 
blue dashed line), or unlimed plot (red circles, red dashed line). The plots 
had an average pH over the period of 7.0 when limed (blue diamond, blue 
continuous line) and 5.2 when unlimed (red diamond, red continuous line). 
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Figure 2 | Change in N concentration measured on archived herbage 
samples taken from Park Grass sub-plot 3d between 1960 and 2012 

that has never received any external inputs of lime or fertilizer. 

A split-line regression fitted to the data has an intersection point of 1980 
(95% confidence limits, 1976 and 1988), with a significant decline after 
1980 (R= 0.67, P< 0.001 using least squares linear regression). During 
the sampling period, legumes never exceed 5% of total biomass, and the 
contribution of nitrogen fixation to N in the herbage is therefore expected 


to be minimal. 


include all species of N, including ammonia; however, estimated total N 
deposition for grassland at this site in 2010-2012, including all N spe- 
cies, was ~21kg ha! yr! (http://www.apis.ac.uk) compared with 
~45kg ha! yr~! measured in 1996 (ref. 4). Analysis of the N concen- 
tration in archived herbage samples from the first hay cut from the plot 
that has never received any fertilizer inputs also showed a significant 
decline in percentage N since the 1980s (Fig. 2). 

We analysed data on relative biomass of vascular plant species sam- 
pled on a range of sub-plots between 1903 and 2012, with a focus on 
both the ‘nil plot’ (plot 3), which has never received any fertilizers, 
and the ‘transition plots’ (Extended Data Table 2). The latter received 
96kgN ha7! (plus P, K, Na and Mg), either as ammonium sulfate 
(plot 9) or sodium nitrate (plot 14) until 1989, when the plots were split. 
Since then, no further N was applied to one-half of the plots (now 9/1 
and 14/1). The original treatment continued on the remaining halves 
(now plots 9/2 and 14/2). Generalized linear models (GLMs) and mixed 
models (GLMMs) were used to quantify the effect of changes in wet 
atmospheric N deposition (measured as either a 3- or 5-year moving 
average) on the proportion of plant functional groups, species richness 
and the exponent of the Shannon diversity index (e). Temporal trends 
in relative species abundance and dissimilarity between plots were also 
analysed using multivariate methods. 

On the nil plot, the proportion of legumes tracked changes in atmos- 
pheric N deposition, declining to low relative abundance at the end of 
the twentieth century before showing a degree of recovery over the 
recent sampling period (Fig. 1 and Extended Data Table 3). The addi- 
tion of lime also increased the proportion of legumes and forbs at the 
expense of grasses. A decrease in pH was observed between 1985 and 
1991 resulting from the deposition of S and N that was not always 
compensated for by the addition of lime!! (Extended Data Table 1). 
It is likely that this contributed to some of the observed decline in the 
proportion of legumes at this time. However, in the recent sampling 
period (1991-2012), pH has largely remained constant while N depo- 
sition has continued to decline, and pH was not significantly correlated 
with wet N deposition in any of the models, allowing them to be treated 
as independent variables. Comparisons of species richness between 
historical sampling periods are confounded by the fact that the area 
sampled and protocol used has changed through time. However, as the 
Simpson diversity index has a low sensitivity to sample size, it can be 
used as an indication of temporal trends in species diversity on Park 
Grass (Fig. 3). A decline in diversity was observed on the nil sub-plots 
between the 1940s and 1990s—declines were steeper on the unlimed 
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Figure 3 | Historical trends in Simpson’s plant diversity index between 
1910 and 2012 for unlimed and limed sub-plots. a, b, Results are 

from unlimed (a) and limed (b) sub-plots. Decadal mean and s.e.m. are 
presented for: plot 3 (no fertilizers; blue circles), plot 7 (PKNaMg; green 
circles), plot 9/2 (PKNaMg plus 96 kg N ha! applied as ammonium 
sulfate; red circles), plot 14/2 (PKNaMg plus 96 kg N ha“! applied as 
sodium nitrate; purple circles), plot 9/1 (N withheld since 1989; red open 
circles, dashed line) and plot 14/1 (N withheld since 1989; purple open 
circles, dashed line). For the limed plots in b, the data post-1965 are from 
the ‘a sub-plots, as they are closest to the pH maintained on the limed 
half of the plots before they were split. The relatively high value for plot 
14/1d compared to plot 3d and 7d in the 1990s can be explained by the 
temporary increase in diversity during the transition period. Plot 7a was 


not sampled between 2010 and 2012. 


sub-plot because of the combined effect of eutrophication and acidifica- 
tion. The latest samples, taken since 2010, show diversity is recovering, 
although it is still at levels below those recorded in the 1930-1940s on 
the unlimed nil plot (Extended Data Table 3). An analysis across all the 
individual sub-plots sampled confirmed the positive effect of decreasing 
atmospheric deposition on plant species richness and diversity, as well 
as an increase in the proportion of legumes (Extended Data Table 4). 
The expected increase in diversity and directional shift in species 
communities on the transition plots after the cessation of N fertiliza- 
tion in 1989 was observed, except on plot 9/1d, which continues to be 
constrained by very low soil pH (Figs 3 and 4). Plot 7, which receives 
the same amount of the other nutrients as plots 9/1 and 14/1 but has 
never had any N fertilizer additions, can be viewed as the plot towards 
which the transition plots should be moving. In the case of the b sub- 
plots, this appears to be the case, but plot 9/1d is only recovering very 
slowly from a low pH, and 14/1d appears to still have a community 
that is intermediate between 14/2 d and 7d. Over most of the recent 
sampling period, the plant community dynamics on plots 9/2 and 
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Figure 4 | Temporal trends in plant communities, between 1991 and 
2012, on selected Park Grass plots. a-d, Redundancy analysis ordinations 
are presented for plot 9b (a), plot 9d (b), plot 14b (c) and plot 14d (d), 
including transition plots (purple circles) and plots that have continued 

to receive fertilizer N (grey circles). In each case, the samples from plot 7b 
or 7d have been included as having the species composition to which the 


14/2, which continued to receive N, unexpectedly showed a temporal 
trend that was largely parallel with the transition plots (Extended Data 
Figs 1 and 2 and Extended Data Tables 5 and 6). This suggests that the 
effect of withholding N fertilizer became apparent within the first few 
years of the treatment change, and since then all the plots have been 
responding to the same underlying environmental trend. A comparison 
of multivariate analyses using either year or +N fertilizer as the explan- 
atory variable showed that the species that responded to withholding 
N on the transition plots were very similar to those driving the tem- 
poral trends in plant communities observed on the wider experiment 
(Extended Data Fig. 3). In particular, abundance of the legume spe- 
cies, Trifolium pratense and Lathyrus pratensis, and the forbs, Plantago 
lanceolata and Ranunculus acris, all increased significantly when 
N fertilizer was withheld and also across the whole experiment with 
time. This was confirmed from the analysis at the sub-plot level for 
plots 3, 9 and 14 (Extended Data Table 7). 

The positive responses of plant diversity to decreasing atmospheric 
deposition and N fertilizer inputs on Park Grass shows that grasslands 
have the capacity to recover from the negative effect of eutrophication, 
particularly where the confounding effect of decreasing pH had been 
removed by applying lime. The fact that legumes showed the strong- 
est temporal response (coinciding with measured reductions in the N 
concentration of the cut herbage on the nil plot) supports the view that 
reducing N deposition was a causal factor of the observed community 
dynamics. However, only wet N deposition was included in our models, 
and we were unable to quantify the contributions from dry deposition 
and other species of N over a sufficient time period, including ammonia 
and nitric acid. Sulfur emissions have also declined since the 1980s, 
and although the liming treatments mean that the indirect effect of 
changing S deposition on soil pH can be treated independently of the 
eutrophication effect of N deposition, we cannot fully discount the 
direct nutritional effect of S$. However, the plots included in the analysis 
with K, Na and Mg as part of the fertilizer treatment also receive up to 
122 kgS ha~! yr~!, meaning they are unlikely to be limited by S. 

The continuity of the experimental treatments on Park Grass, 
together with the measurement of atmospheric chemistry and plant 
community data on the same local scale, avoids some of the problems 
associated with attributing large-scale ecological changes observed in 
national vegetation surveys to anthropogenic drivers’”. This may partly 
explain why a clear signal of a recovery of plant diversity from eutroph- 
ication has yet to be detected in the wider landscape!*. However, it 


transition plots are moving towards (green circles). Sub-plots a and c are 
excluded as they were not sampled on plot 7 in 2010-1012. The size of the 
symbols is proportional to the numbers of species in each sample, and the 
relative proportion of the plant functional groups have been projected as 
supplementary variables. 


is also the case that the magnitude of local scale reductions in N depo- 
sition we observed at Rothamsted are not yet reflected at the national 
scale to the same degree’. Interpreting changes on the Park Grass 
Experiment more widely in the context of comparisons with other 
grassland studies and its relevance to the wider landscape must also 
take into account the specific management context. The twice-yearly 
mowing and removal of biomass on Park Grass may explain the rela- 
tively rapid transient dynamics observed on the experiment when com- 
pared to equivalent studies in systems with less disturbance, leading to 
the accumulation of litter°, or dominated by slower growing, woody 
species'*'®, In addition, the close proximity of plots with differing plant 
communities means that limitation of propagules is not likely to be as 
important a constraint in the recovery of the communities as may be 
the case for larger scale grassland restoration!’. Despite these consid- 
erations, the Park Grass Experiment remains a unique indicator of the 
effects of environmental change and an important part of the evidence 
base for assessing the biological effects of changes in management or 
policy on the wider environment. 


Online Content Methods, along with any additional Extended Data display items and 
Source Data, are available in the online version of the paper; references unique to 
these sections appear only in the online paper. 
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METHODS 


Description of Park Grass Experiment and vegetation sampling protocol. The 
Park Grass Experiment was established on old grassland at Rothamsted in 1856 to 
examine the effects of different mineral fertilizers and organic manures on produc- 
tivity of permanent pasture cut for hay®. The experiment is located on a moderately 
well-drained silty clay loam overlying clay-with-flints, a chronic or vertic Luvisol 
according to the FAO classification. The soil pH was slightly acidic when the exper- 
iment began (5.4-5.6), and the nutrient status was poor. The original vegetation of 
Park Grass was classified as dicotyledon-rich Cynosurus cristatus-Centaurea nigra 
grassland; one of the mesotrophic grassland communities in the British National 
Vegetation Classification system!*. Treatments imposed in 1856-1865 included 
controls (nil, no fertilizer or manure), and various combinations of P, K, S$, Mg 
and Na, with N applied as either sodium nitrate or ammonium salts. Each plot 
(ranging from 75 to 634m?) now consists of plant communities adapted to the 
fertilizer treatments naturally assembled from the local species pool. Farmyard 
manure was applied to two plots but was discontinued after 8 years, because when 
applied annually to the surface in large amounts it did not decompose quickly and 
had adverse effects on the sward. Farmyard manure, applied every 4 years, was 
re-introduced on three plots in 1905. 

The experiment consists of 20 main plots. The plots are cut in mid-June and 
made into hay. For 19 years, the re-growth was grazed by sheep penned on indi- 
vidual plots, but since 1875 a second cut, usually carted green, has been taken in 
place of grazing. The plots were originally cut by scythe, then by horse-drawn 
and then tractor-drawn mowers. Yields were originally estimated by weighing 
the produce, either of hay (first harvest) or green crop (second harvest), and dry 
matter was determined from the whole plot. Since 1960, yields of dry matter have 
been estimated from strips cut with a forage harvester. However, for the first cut 
the remainder of the plot is still mown and made into hay, continuing earlier man- 
agement and ensuring the return of seed. For the second cut, the whole plot is 
cut with a forage harvester. A small amount of lime, 4 t CaCO3 ha~!, was added 
to all plots in the late 1880s. Most plots were divided in two in 1903 or 1920 to 
introduce a test of regular liming on one half. In 1965, they were further divided 
into four sub-plots (a—d). The a, b and c sub-plots now receive lime every 3 years, 
if necessary, sufficient to maintain a target soil pH of 7, 6 and 5, respectively. The 
d sub-plots are unlimed. In 1989, the plots receiving 96 kg N ha! were split, and 
nitrogen fertilizer withheld from half of the plots to investigate the ability of the 
plant communities to recover from chronic nitrogen addition. 

Vegetation surveys have been carried out on Park Grass on more than 30 occa- 
sions since the experiment began’. The original botanical sampling protocol was 
to take handfuls of cut herbage at regular intervals from every swath after the scythe 
or cutting machine. Each sample was then sub-sampled until a weight of approx- 
imately 12-20 lb (5.4-9.1 kg) was obtained. For the samples taken between 1973 
and 1976, samples were cut by hand every two to three paces along ten transects on 
the larger plots and six on the smaller plots. Approximately 600 g of material was 
analysed from each sub-plot. Between 1991 and 2012, above-ground biomass from 
six randomly located 50 x 25 cm quadrats was sampled from all sub-plots using a 
standard protocol. The herbage was cut with scissors to ground level in early June, 
immediately before harvesting the first hay crop. The plant material was taken 
back to the laboratory where it was sorted into species. Samples were oven-dried 
at 80°C for about 24h, after which dry mass was determined for each species. Data 
from the six quadrats were aggregated to provide an estimate of species richness 
for each plot in each year. 

Monitoring of atmospheric nitrogen deposition: wet ‘bulk deposition. The 
methods of collection and the amount of data available have varied over time. 
Initial precipitation data (1853-1968) were collected using a rain gauge with a 
surface area of one-thousandth of an acre (7 ft 3.12 in. x 6ft, or 4.04m?). The gauge 
being constructed at ground level of lead supported by wood over a brick lined 
cellar housing four collection tanks from which a sample of rain water was taken. 
From 1969 to 1986, precipitation was collected in what is described as a ‘simple 
funnel-and-bottle bulk gauge”. In the latter years, 1986 to present, precipitation 
has been collected in a bulk rain water collector of a design described previously". 
All these collection methods took place within the Rothamsted meteorological 
enclosure, which is located approximately 817 m east-northeast of the Park Grass 
Experiment. The amounts of nitrate-N and ammonium-N, in mg 1-1! in solution, 
were then determined. The amount of nitrogen in kg ha”! deposited by wet dep- 
osition is determined by the formula: kg ha l= (mgl~! x R)/(A x 10°), in which 
Ris the amount of rain water collected in mm, and A is the surface area of collector 
funnel in m’. The total amount of nitrogen was then calculated for each year. 

Monitoring of atmospheric nitrogen deposition: dry deposition. Nitrogen 
deposition in the form of NO was collected passively using diffusion sam- 
plers over an exposure period of 2 weeks. The samplers are made up of a 30-11 
aliquot of 20% triethanolamine/water absorbent sandwiched between two 
stainless steel meshes housed in a coloured thermoplastic rubber cap. Into this 
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cap, a 70-mm-long x 11-mm-diameter acrylic tube is inserted with a protective 
white thermoplastic rubber cap on the opposite end. These samplers were then 
placed, in sets of three, at locations around the edge of the Park Grass Experiment 
at a height of 1.5m above ground and the protective cap removed. After 2 weeks, 
samplers were sealed using a protective cap and collected. They were then 
extracted into 2 ml 18.2 MQ RO (reverse osmosis) water and analysed for nitrite 
N (NO)-N) by continuous colourimetric flow analysis. The resulting levels of 
NO)-N in pg Nm? were then averaged over the year and converted to kg Nha! 
by multiplying by the deposition velocity for managed grassland at Rothamsted 
(0.3751mm s~!)”?, 

Analysis of herbage samples for nitrogen concentration. Representative 
sub-samples of plant material from the archived hay or herbage samples on sub- 
plot 3d were dried at 40°C for 48h, ball-milled to a homogenous fine powder, 
dried again at 60°C for 24h and analysed with an elemental analyser (NA 1110; 
Carlo Erba) interfaced (ConFlo II, Finnigan MAT) to a continuous-flow isotope 
ratio mass spectrometer (Delta Plus; Finnigan MAT), EA-CF-IRMS. After every 
tenth sample, a solid internal laboratory standard (SILS) with similar C/N ratio 
as the respective sample material (fine ground wheat flour) was run as a control. 
The precision (s.d.) for sample repeats was better than 0.04%. Samples taken from 
herbage cut between 1960 and 2012 were analysed, before this date, the herbage 
sampling protocol differed with material dried in situ, which is affected by disin- 
tegration losses in the hay making process. 

Statistics. For the plots where all the sub-plots were sampled (3, 9 and 14), all 
sub-sets regression using GLMs was used to identify the model that explained 
the maximum variability in species richness, e”’ and the relative proportion of 
functional groups using only independent explanatory variables with P < 0.05. 
The following explanatory variables were included: pH, wet atmospheric nitrogen 
deposition (included both as a 3- and 5-year moving average), total rainfall in the 
previous growing season (March—August) and rainfall in current growing season 
(March-May)—rainfall has been found to explain short-term variability in com- 
munity composition significantly”*. For the proportion of the different functional 
groups (legumes, grasses and ‘other’), a binomial distribution with a logit link 
function was used to allow for the variability in the total first cut biomass to be 
accounted for. The proportion of each functional group was analysed separately. 
A normal distribution with an identity link was used for species richness and e”” 
except for the acid plots with a high frequency of low species counts, in which a 
Poisson distribution with a log link was used. As opposed to a step-wise approach, 
all sub-sets regression analyses included all possible combinations of explanatory 
variables, using the adjusted R? and Mallows’ C, as criteria for comparing models. 

For the nil treatment (plot 3), that has never received any fertilizers, data were 
available on relative proportion of functional groups that covered an important 
historical period from 1903 in which wet nitrogen deposition and pH were also 
measured on the experiment. This whole data set (n = 70) was therefore included 
in the models. Although less frequent data were available on species richness and 
diversity, changes in the area sampled and sorting effort meant that a comparison of 
data over the whole historical period would not have been valid. However, between 
1991 and 2012, a standard area and sampling protocol was used. This time period 
coincided with the reductions in atmospheric nitrogen deposition observed on the 
experiment and was, therefore, used to quantify responses of species richness and 
e! (n= 13) across all the plots. 

For the analysis of the data from all sub-plots sampled across a range of fertilizer 
treatments, a GLMM was used with sub-plot and year input as random factors. 
Fixed effects were input in the same order to a model previously fitted to explain 
variance in species richness between fertilizer treatments”: pH, nitrogen addition 
(three levels; +48, +96 or +144kg N ha!) and +phosphorus before inputting 
N deposition as a final continuous explanatory variable as either a 3- or a 5-year 
moving average. Two further fixed effects were initially included, +potassium 
and whether the plot was a transition plot, but neither significantly explained any 
additional variance in any of the diversity metrics. Plot 13 was the only farmyard 
manure plot in the data set and was not included in this analysis. Both the GLMs 
and the GLMMs were run using the software GenStat”. 

The temporal shifts in plant communities was analysed at the species level using 
multivariate approaches. To investigate any directional response of the transition 
plots following the cessation of nitrogen fertilization, the Bray-Curtis dissimilarity 
index was calculated using the first sample date, 1991, as a reference point and 
regression models fitted to the data using year as the explanatory variable. This was 
also done for the plots that continued to receive nitrogen fertilizer with the expecta- 
tion that these plots would increasingly diverge from the transition plots with time. 
Nonlinear regression with groups was used to quantify differences in the responses 
of the Bray—Curtis index to time of the transition plots and those that had contin- 
ued to receive nitrogen. Redundancy analysis, using rainfall in the current grow- 
ing season as a covariate, was used to identify community shifts over time, using 
GLMs to identify species that responded significantly to the first ordination axis, 


© 2015 Macmillan Publishers Limited. All rights reserved 


LETTER 


constrained by year. This was done for each of the sub-plots separately for plots 3, 
9/1, 9/2, 14/1 and 14/2. Finally, the species responding to withholding N fertilizer 
were compared with those driving the temporal responses using two additional 
partial canonical correspondence analyses. First, data from 1991-2012 for plots 
9 and 14 were analysed with year input as a categorical covariate and nitrogen 
as the explanatory variable. Second, the data from all the sub-plots sampled from 
1991-2012 were analysed, excluding the transition plots (9/1 and 14/1) with plot 
input as a covariate, and year as a continuous explanatory variable. Only species 
that were recorded at least three times at the level of the sub-plot were included in 
the analysis. The software, Canoco 5, was used for all the multivariate analyses”®. 
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Extended Data Figure 1 | The Bray-Curtis dissimilarity index. a—h, The 
response of Bray—Curtis dissimilarity for all sub-plots on plot 9 (a-d) and 
plot 14 (e-h). Community data from 1992-2012 have been compared to 
samples taken in 1991 for the transition plots (filled circles) and plots that 
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allowing a direct comparison between treatments at the main plot level. 
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Extended Data Figure 3 | Comparison of the effect of decreasing 
atmospheric N inputs and the cessation of N fertilization on plant 
communities. a, Partial canonical correspondence analysis (CCA) of 
the effect of withholding nitrogen fertilizer on plant communities on 

the transition plots 9/1 and 14/1 compared to the plots that continued 

to receive N, 9/2 and 14/2. Data from all sub-plots during the modern 
day sampling period (1991-2012) were used, and year was included as 

a categorical covariate. b, Partial CCA of the temporal response of plant 
communities on all sub-plots sampled during the modern day period, 
1991-2012, excluding the transition plots 9/1 and 14/1, with year entered 
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as a continuous variable and plot as a covariate. In both ordination plots, 
species were only included if they were in the top 20 species ranked by 
their weighting in the CCA and had a P value indicating their association 
with the constrained axis of <0.1. Agrca, Agrostis capillaris; Alopr, 
Alopecurus pratensis; Antsy, Anthriscus sylvestris; Arrel, Arrhenatherum 
elatius; Conma, Conopodium majus; Dacgl, Dactylis glomerata; Hersp, 
Heracleum sphondylium; Latpr, Lathyrus pratensis; Plala, Plantago 
lanceolata; Poapr, Poa pratensis; Poatr, Poa trivialis; Ranac, Racunculus 
acris; Rumac, Rumex acetosa; Trapr, Tragopogon pratense; Tripr, Trifolium 
pratense. 
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Extended Data Table 1 | pH (in water) measured on soil cores, 0-23 cm, on seven occasions between 1991 and 2012 


Plot Treatment Nov-01 Feb-05 Mar-98 Mar-02 Mar-05 Mar-08 Marti 
ib = 6A 7.0 71 73 72 71 69 
id 1 36 42 40 44 40 38 40 
2/2b - 65 69 69 7A 75 73 74 
2/2b 48 5.0 52 54 52 54 54 
3a 64 74 71 73 73 72 72 
3b 7 64 65 65 63 64 64 63 
3c . 5.0 5.4 53 5.2 51 49 5.2 
3d 48 52 52 53 53 52 53 
7b 62 60 59 58 60 64 62 
7d PKNaMg 48 5.0 47 49 49 49 49 
Oita 57 65 65 69 7.0 71 71 
O/tb 52 59 64 64 64 64 64 
o/te (N2)PKNaMg 44 46 48 53 5.0 49 52 
avid 37 44 4.0 44 4.0 4.0 41 
O/2a 62 68 68 74 72 71 71 
9/2b 54 62 62 64 60 63 62 
9/2c N2PKNaMg 44 53 48 48 54 48 54 
o/2d 3.6 3.9 3.6 36 3.6 3.4 37 
10b 54 55 59 64 67 62 63 
40d N2PNaMg 34 38 37 37 37 35 37 
T11b 54 64 64 62 64 60 64 
t4ilid NaPKNaMg 32 37 35 36 35 34 3.6 
11/2b ont 52 59 56 63 65 64 64 
414/2d NaPKNaMgsi 34 3.9 37 3.6 3.6 3.4 3.6 
73/26 63 64 62 59 60 59 64 
43/2d PYMTPM 46 54 51 54 52 52 5.0 
141A 66 69 68 7.0 69 7.0 69 
14/1b . . 66 63 64 60 57 62 60 
44/ic (N*2)PKNaMg 58 58 56 55 54 53 53 
14/1d 56 58 56 56 56 56 54 
14/2a 68 68 69 7.0 7.0 7.0 70 
44/26 : 68 64 65 63 63 63 62 
44/2c N*2PKN aMg 6.0 64 64 60 64 60 59 
44/2d 58 57 64 6.0 64 64 60 
17b e 67 65 66 62 64 62 63 
17d 56 58 58 57 60 58 57 


Treatments: Ni, N2 and N3: ammonium sulfate supplying 48, 96 and 144kg N and 55, 110 and 165kg $ ha -!; N*; and N*2: sodium nitrate supplying 48 and 96 kg N and 78 and 157kg Na ha’?; 
(No), (N*2): N last applied 1989; P: triple superphosphate supplying 35 kg P ha~!; K: potassium sulfate supplying 225 kg K and 99 kg S ha~!; Na: sodium nitrate supplying 15 kg Na and 

10kg Na ha~!; Mg: magnesium sulfate (Epsom salts) supplying 10kg Mg and 13kg S ha7!; Si: water-soluble sodium silicate supplying 135 kg Si and 63 kg Na ha~!; FYM: 35 t farmyard manure ha~! 
supplying ~240 kg N, 45kg P, 350kg K, 25 kg Na, 25kg Mg, 40kg S, 135kg Ca ha?; PM: pelleted poultry manure (replaced fishmeal in 2003) supplying ~65 kg N ha~'. Sub-plots a, b and c receive 
differential amounts of lime, if needed, every 3 years to maintain soil pH at 7, 6 and 5, respectively; sub-plot d receives no lime. 
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1991-1993 and 2010-2012 


Plot Treatment Species number (s.e.m.) 

1991-93 2010-12 1991-93 2010-12 
1b 7 22(15) 22(0.3) 8.4(0.42) 8.8(1.13) 
id 1 4(0.6 4(0.3 1.6(0.23 2.3(0.19 
2/2b = 27(0.9) 33(3.2) 12.4(0.48) 15(1.19) 
2/2d 24(1.2) 26(06) 5 8(0.64) 9.9(0.7) 
3a 30(1.2) 31(1.2) 11.4(0.58) 15.70.33) 
3b sis 29(0) 30(1.7) 10.6(0.85) 15.1(0.22) 
3c 25(3.1) 28(1.2) 6.2(0.42) 9.7(1.21) 
3d 28(2 7) 27(09) 7.3(0.43) 11.2(1.5) 
7b 22(09) 25(1.7) 10.3(0.58) 8.9(1.31) 
7d PKNaMg 20(0.3 26(1.5 7.9(0.38 10.5(0.91 
Ota 20(1.3) 26(0.9) 11(0.46) 11.1(0.51) 
9/1b ; 15(1.2) 26(0.3) 8.1(0.69) 7.3(0.79) 
9/1c (N2)PKNaMg 10(0.8) 27(0.3) 2.6(0 66) 11.7(0.79) 
gtd 3(0.3) 6(1.2) 1.5(0.07) 2.4(0.17) 
9/2a 16(1.5) 21(06) 7(0.22) 9.3(0.64) 
9/2b 13(0.9) 22(0.3) 4.8(0.24) 8.4(0.46) 
9/2c N2PKNaMg 10(0.9) 22(0.3) 3.9(0.76) 8.4(0.39) 
9/2d 3(0.6) 3(0) 15(0.2) 1.9(0.03) 
10b 9(0.7) 15(1) 5(0.86) 55 (0.37) 
10d cecihdnanes. 2(0.3) 3(0.3) 1.4(0.05) 1.5(0.4) 
11/16 12(1) 14(0.3) 4.2(0.4) 6.5(0.47) 
11/1d NaPKNaMg 1(0.3) 1(0) 1(0.01) 1(0) 
T1/2b 8(0) 74(0.6) 3.9(0.26) 48 (0.24) 
11/2d NaPKNaMgSi 1(0.3) 3(0.3) 1(0) 1.3(0.11) 
13/2b una 21(0.3) 25(1.5) 9.7(0.76) 11.6(0.59) 
13/2d sails 20(1.2 25(0.9 7.7(0.34 10.7(0.51 
14/Ta 20(1.2) 25(0.6) 8.7(0.38) 10(0.89) 
14/1b . 20(1.5) 23(1.5) 8.7(1.34) 10.8(0.78) 
14/1c (N*2)PKNaMg 18(1) 26(1.5) 7.8(0.84) 12.6(1.46) 
14/1d 18(0.9 22(0.6 8.6(0.08 12.4(0.33 
14/2a 17(1.2) 22(1.2) 4.3(0.26) 9(0.43) 
14/2b - 14(1.5) 20(06) 4.6(0.42) 8.5(0.62) 
14/2c saa 14(1.7) 20(1.2) 4.2(0.6) 9 4(0.08) 
14/2d 16(0.7 19(0.9 5.6(0.52 9.8(0.31 
17b +7 23(3) 27(2.4) 8.2(1.28) 10.8(1.66) 
17d 1 25(1.9) 29(0.9) 10.1(0.52) 10.8(0.46) 
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Extended Data Table 2 | Species richness and e”’ observed in a total sample area of 0.75 m? averaged over three biomass samples taken in 


Sub-plots a and b have been receiving lime since the early 1900s while lime has only been added to sub-plot c since 1965, which also shows the largest increases in species richness over the modern 
day sampling period, possibly reflecting a continuing slow recovery from low soil pH. Treatments: Ni, No and N3: ammonium sulfate supplying 48, 96 and 144kg N and 55, 110 and 165kgS ha’!; 
N*;, N*s: sodium nitrate supplying 48 and 96 kg N and 78 and 157 kg Na ha“!; (No), (N*2): N last applied 1989; P: triple superphosphate supplying 35 kg P ha~!; K: potassium sulfate supplying 

225 kg K and 99kg S ha~!; Na: sodium nitrate supplying 15 kg Na and 10kg S ha~!; Mg: magnesium sulfate (Epsom salts) supplying 10kg Mg and 13kg S ha™!; Si: water-soluble sodium silicate 
supplying 135 kg Si and 63 kg Na ha’!; FYM: 35 t farmyard manure ha! supplying ~240 kg N, 45 kg P, 350kg K, 25kg Na, 25kg Mg, 40 kg S, 135kg Ca ha!; PM: pelleted poultry manure (replaced 
fishmeal in 2003) supplying ~65 kg N ha~!. Sub-plots a, b and c receive differential amounts of lime, if needed, every 3 years to maintain soil pH 7, 6 and 5 respectively; sub-plot d receives no lime. 
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Extended Data Table 3 | GLM's fitted to relative biomass data from the Park Grass nil plot, which has never received any external inputs of 


lime or fertilizer 


Response variable Explanatory variable Estimate t statistic Degrees of F probability 
s.€ freedom 
Sees Soil pH 3.2(0.55) 5.75 
Species number:es+-20:2 Atm Ns yar 5204) Bd 49 aan 
re Soil pH 3.2(0.40) 8.01 
PF ss1-2012 Atm Ne yess 1.2(0.32) -3.69 = aon 
ens ones Soil pH 0.74(0.140) 5.34 
Legumes soz-2012 Atm Ne yess -0.48(0.089) 5.37 - a 
a Soil pH -0.56(0.078) -7.04~ 
eae Atm Ne yes- 0.14(0.041) 3.30 67 <0.001 
Other; so2.2012 Soil pH 0.40(-0.069) 583, 69 <0.001 


A normal distribution with an identity link was used for species number and e” and a binomial distribution with a logit link for the proportion of legume, grass and other (non-leguminous forbs), each 
analysed separately (over-dispersion was corrected for by estimating the dispersion parameter and using the F statistic to test for significance). For plant functional groups, data were available from 


1903 but, because of changes in the sampling protocol, only data from the recent sampling period (1991-2012) were analysed for species richness and e”’. 
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Extended Data Table 4 | Effect of declining N deposition (measured either as a 3- or 5-year moving average), pH and N or P fertilizers on 
metrics of plant diversity 


Response Explanatory Estimate (s.e.) F statistic Degrees of P value 
variable variable freedom 
Species pH . 0.20 (0.035) 44.2 366 <0.001 
number +N-48kgha" -0.92 (0.346) 11.5 27 <0.001 
96kgha"’ -0.47 (0.210) 
144kgha" -1.28 (0.467) 
+Phosphorous ns 26 26 0.120 
N Deposition: yes -0.09 (0.028) 98 11 0.010 
eH pH - 0.18 (0.044) 28.3 134 <0.001 
+N-48kgha’ -0.71 (0.291) 11.3 23 <0.001 
96kgha™" -0.47 (0.174) 
144kgha -1.14 (0.389) 
+Phosphorous ns 04 21 0.514 
N Depositions yes. -0.08 (0.033) 69 11 0.023 
Proportion of pH ; 0.55 (0.151) 13.9 298 <0.001 
Legumes +N-48kgha" -2.87 (1.739) 6.8 43 <0.001 
96kgha”’ -2.33 (0.625) 
144kgha" -5.08 (2.304) 
+Phosphorous 2.23 (0.672) 11.1 23 0.003 
N Depositions yes. -0.55 (0.185) 8.8 10 0.014 
Proportion of pH . -0.85 (0.088) 110.4 115 <0.001 
Grass +N:48kgha" 1.07 (0.265) 31.7 24 <0.001 
96kgha”’ 1.63 (0.489) 
144kgha" 3.04 (0.265) 
+Phosphorous ns 0.0 19 0.997 
N Depositions yes, 0.27 (0.122) 48 1 0.050 


Effect size of additional N fertiliser expressed in relation to plots receiving no added nitrogen 


GLMMs were fitted to the data from all sub-plots sampled in the modern day period (1991-2012), with the exception of the one plot that receives organic manures, with plot and year input as random 
factors. A Poisson distribution with a log link was used for species richness and e”” and a binomial distribution with a logit link for the proportion of legumes and grasses, each analysed separately. 
Where necessary, over-dispersion was corrected for by estimating the dispersion parameter and using the F statistic to calculate P Two additional variables, +potassium and whether the data were 
from a transition plot, did not explain any variance in the response variables in any of the models. 
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Extended Data Table 5 | GLMs fitted to relative biomass data from the Park Grass plot 9 (N2PKNaMg) 


Plot 9/1 
Response variable Explanatory vanable Estimate t statistic Degrees of F probability 
S$. freedom 
. Soil pH 3.5(0.74) 474_ 

Species number Atm No yess 3.20 60) 534 36 <0.001 

eH Soil pH 2.6(0.47) 5.56 37 <0.001 

Legumes Atm Ne yesr -0.94(0.130) -7.19" 37 <0.001 
Soil pH -0.66(0.165) -3.99" 

aes Atm No year 0.58(0.109) 5.30 = om 

Other Soil pH 0.54(0.145) 3.69 37 <0.001 

Plot 9/2 

Response variable Explanatory variable Estimate t statistic Degrees of F probability 

(s.e.) freedom 
. Soil pH 1.7(0.48) 3.63 

Species number Atm Ne yes -2.9(0 38) 769 36 <0.001 

eH Soil pH 1.3(0.28) 477_ 
Atm Ne year -0.8(0.22) -3.62 36 <0.001 

Legumes Atm Ne yesr -0.73(0.174) -4.16— 37 <0.001 
Soil pH -0.37(0.124) 3.027 

Grass Atm N= yes 0.30(0.094) 3.20 - “a 
Soil pH 0.74(0.121) 6.12— 

omer Rain... 0.002(0.0008) 2.22 - 3.00% 


On plot 9/1, N was last applied as ammonium sulfate in 1989. Sub-plot d with very low pH and species diversity was excluded from the analysis. All data are for the recent sampling period, 
1991-2012. A normal distribution with an identity link was used for species number and e”’ and a binomial distribution with a logit link for the proportion of legume, grass and other (non-leguminous 
forbs), each analysed separately (over-dispersion was corrected for by estimating the dispersion parameter and using the F statistic to test for significance). 
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Extended Data Table 6 | GLMs fitted to relative biomass data from the Park Grass plot 14, (N*2PKNaMg) 


Plot 14/1 
Response variable Explanatory vanable Estimate t statistic Degrees of F probability 
$.e. freedom 
Species number Atm Neyex -2.0(0.29) 6.70 50 <0.001 
eH Soil pH -1.0(0.45) -2.21° 
Atm Neyer -0.8(0.24) 3.41 49 <0.001 
Legumes Atm Neyer -0.47(0.098) -4.847 50 <0.001 
Grass Atm Ne yee 0.26(0.075) 3.46- 50 <0.001 
Other Rain, 0.003(0.0005) 5.447 50 <0.001 
Plot 14/2 
Response variable Explanatory vanable Estimate t statistic Degrees of F probability 
(S.€.) freedom 
Species number Atm Neyer -2.1(0.40) 5.417 50 <0.001 
eH Soil pH -1.5(0.57) 2.61. 
Atm Neyex -1.3(0.23) 551 = =0.001 
Legumes Atm Nayex -1.50(0.225) 6.65 50 <0.001 
Grass Atm Noyes 0.31(0.073) 4.207 50 <0.001 
Other Atm Ne vex -0.35(0.076) 4.59" 49 <0.001 
Rain.., 0.003(0.0007) 408 


On plot 14/1, N was last applied as sodium nitrate in 1989. All data are for the recent sampling period, 1991-2012. A normal distribution with an identity link was used for species number and e”” and 
a binomial distribution with a logit link for the proportion of legume, grass and other (non-leguminous forbs), each analysed separately (over-dispersion was corrected for by estimating the dispersion 
parameter and using the F statistic to test for significance). 
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Extended Data Table 7 | Significant responses of individual species to year at the level of the sub-plot 
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GLMs were used to quantify species responses and precipitation before the first cut included as a covariate for all plots. 
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Sex-dependent dominance at a single locus 
maintains variation in age at maturity in salmon 


Nicola J. Barson!*, Tutku Aykanat**, Kjetil Hindar*, Matthew Baranski*, Geir H. Bolstad*, Peder Fiske*, Céleste Jacq‘, 
Arne J. Jensen’, Susan E. Johnston®, Sten Karlsson*, Matthew Kent!, Thomas Moen®, Eero Niemelii’, Torfinn Nome!, Tor F. N zesje°, 


Panu Orell’, Atso Romakkaniemi’, Harald Segrov®, Kurt Urdal®, Jaakko Erkinaro’, Sigbjorn Lien! & Craig R. Primmer 


Males and females share many traits that have a common genetic 
basis; however, selection on these traits often differs between the 
sexes, leading to sexual conflict!*. Under such sexual antagonism, 
theory predicts the evolution of genetic architectures that 
resolve this sexual conflict?~>. Yet, despite intense theoretical 
and empirical interest, the specific loci underlying sexually 
antagonistic phenotypes have rarely been identified, limiting our 
understanding of how sexual conflict impacts genome evolution®® 
and the maintenance of genetic diversity®”. Here we identify a large 
effect locus controlling age at maturity in Atlantic salmon (Salmo 
salar), an important fitness trait in which selection favours earlier 
maturation in males than females®, and show it is a clear example 
of sex-dependent dominance that reduces intralocus sexual 
conflict and maintains adaptive variation in wild populations. 
Using high-density single nucleotide polymorphism data across 
57 wild populations and whole genome re-sequencing, we find 
that the vestigial-like family member 3 gene (VGLL3) exhibits 
sex-dependent dominance in salmon, promoting earlier and 
later maturation in males and females, respectively. VGLL3, an 
adiposity regulator associated with size and age at maturity in 
humans, explained 39% of phenotypic variation, an unexpectedly 
large proportion for what is usually considered a highly polygenic 
trait. Such large effects are predicted under balancing selection 
from either sexually antagonistic or spatially varying selection”. 
Our results provide the first empirical example of dominance 
reversal allowing greater optimization of phenotypes within each 
sex, contributing to the resolution of sexual conflict in a major and 
widespread evolutionary trade-off between age and size at maturity. 
They also provide key empirical evidence for how variation in 
reproductive strategies can be maintained over large geographical 
scales. We anticipate these findings will have a substantial impact 
on population management in a range of harvested species where 
trends towards earlier maturation have been observed. 

The importance of balancing selection in maintaining variation in 
fitness-related traits, which are expected to be under strong selection, 
is a long-standing question in evolutionary biology”!!”, with recent 
models suggesting that balancing selection may be particularly impor- 
tant in maintaining genetic variation”*. Sexually antagonistic selection 
on traits with a shared genetic architecture where each sex is displaced 
from their optimal phenotype is one mechanism generating balancing 
selection?*®*4, Theoretical models predict that dominance reversals, 
where the dominant allele in one sex is recessive in the other, would 
greatly reduce constraints on the resolution of sexual conflict and may 
be particularly efficient at maintaining variation through heterozy- 
gote superiority across the sexes®!*">, although this architecture has 
never been observed in the wild. A paucity of empirical examples with 


2 


known genetic architecture means that the evolutionary consequences 
of sexual conflict, particularly its importance in maintaining adaptive 
variation®*'*, remains largely unknown'*!®, 

The age at which an individual reproduces is a critical point in its 
life history. Age at maturity affects fitness traits including survival, 
size at maturity and lifetime reproductive success!’. Age at maturity 
in Atlantic salmon represents a classic evolutionary trade-off: larger, 
later-maturing individuals have higher reproductive success on spawn- 
ing grounds’®, yet also have a higher risk of dying before first repro- 
duction!”. Atlantic salmon reproduce in freshwater, with offspring 
migrating to sea to feed before returning to their natal river to spawn. 
The number of years spent at sea before spawning, namely their age 
at maturity, or ‘sea age; has a dramatic impact on size at maturity, 
typically 1-3 kg and 50-65 cm after 1 year compared with 10-20 kg 
and >100cm after 3 or more years!’. Males mature earlier and at 
smaller size on average, whereas females mature later, with a stronger 
correlation between body size and reproductive success compared 
with males'®. There is evidence for sex-specific selection patterns on 
age at maturity, as life-history strategies differ considerably between 
males and females'®. 

We investigated the genetic basis of age at maturity in Atlantic 
salmon using two independent data sets. The first, Tana (TAN), 
included two subpopulations from a large river system (Tana/Teno 
River; 68-70° N: n = 463); the second, Norway (NOR), comprised 54 
populations spanning the Norwegian coast from 59° N to 71° N, con- 
taining both Atlantic and Barents/White Sea phylogeographic lineages 
(n= 941; NOR mean n per population = 17.4). Both data sets sampled 
geographically proximate populations with contrasting ages at matu- 
rity (Extended Data Fig. 1, Supplementary Information and Extended 
Data Table 1). Genome-wide association studies (GWAS) for age at 
maturity were conducted within both data sets using 208,704 single 
nucleotide polymorphisms (SNPs) (Supplementary Note). A region 
spanning approximately 100 kb on chromosome 25 was strongly asso- 
ciated with age at maturity in both data sets (GWAS; P< 1 x 10-9; 
Fig. la, c and Extended Data Fig. 2) explaining 39.4% (standard 
error 1.1%) of the total phenotypic variation. This association was 
further validated in a phylogeographically distant Baltic Sea data 
set (BAL) (P< 9.74 x 10~®; Extended Data Fig. 3), confirming that 
the region is evolutionarily conserved across all European lineages. 
The region included two candidate loci (Fig. 1c and Extended Data 
Fig. 4a), VGLL3 and A-kinase anchor protein 11 (AKAP11). VGLL3 
is a transcription cofactor with a role in adipogenesis as a negative 
regulator of terminal adipocyte differentiation, and its expression is 
correlated with body weight and gonadal adipose content in mice””. 
VGLL3 has also been associated with age at menarche”! and puber- 
tal height growth in humans”, indicating a remarkably high level of 
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Figure 1 | Genetic mapping of age at maturity and divergence across 
populations. a, GWAS for age at maturity for the TAN and NOR data 
sets combined. Insets show the two data sets independently (Extended 
Data Fig. 2). b, Signatures of spatially divergent selection using the 

FLK Fey outlier test (56 populations, total n = 1,404). Solid and dashed 
lines indicate the smoothed median and 99.5% quantile of the neutral 
distribution, respectively. Ten SNPs flanking the VGLL3 op and 

SIX6rop SNPs (filled symbols) are marked with red circles and triangles, 
respectively. (PvcLi3Top = 1.44 x 10°} and Pstx6top ~ 0). c The gene 
model and linkage disequilibrium plot of the ~0.5 Mb region around the 
significant region on chromosome 25. Notable SNPs are colour coded 
with red (VGLL37op), blue (VGLL3;xs) and green (SNPs tagging missense 
mutations in VGLL3 and AKAP11). Shorter tick marks in the SNP axis 
indicate re-sequencing variants. 


functional conservation. Age at menarche is associated with adipos- 
ity in humans*!”’, and puberty in fish is linked to the absolute level 
or rate of accumulation of lipid reserves”*. Threshold levels of fat 
reserves at critical times of year are thought to control the initiation 
of maturation in salmon™. Therefore, VGLL3 may serve to regulate 
the interaction between fat reserves (adiposity) and maturation in 
salmon, in a similar manner to mammals, and is a strong candidate 
gene for age at maturity. AKAP11 is expressed throughout spermato- 
genesis and is important for mature sperm motility”*. Targeted whole 
genome re-sequencing of 32 individuals from seven populations 
revealed two missense mutations in VGLL3 in strong linkage dise- 
quilibrium with a nearby highly associated genic SNP (VGLL3 14) and 
with each other (Met54Thr-VGLL3 14g 1? = 1; Asn323Lys-VGLL314G 
r’>=0.72; Met54Thr-Asn323Lys r’ = 0.72; Extended Data Fig. 4a) and 
confirmed a missense SNP had been genotyped in AKAP11 (Fig. 1c 
and Extended Data Fig. 4a). A test for predicting changes in protein 
structure/function (PolyPhen2, see Methods) strongly supported two 
of these mutations having an effect on phenotype, owing to high evo- 
lutionary conservation of the codons (VGLL3 Asn323Lys, naive Bayes 
posterior probability = 0.976, sensitivity = 0.76, specificity = 0.96; 
AKAP1I1 Val214Met, naive Bayes posterior probability = 0.716, sen- 
sitivity = 0.86, specificity = 0.92). 

A second genomic region, which spans 250 kb on chromosome 
9, was strongly associated with age at maturity (P< 10~”°; Extended 
Data Fig. 2a, Extended Data Fig. 4b-c and Supplementary informa- 
tion), but was no longer significant after population stratification 
correction (Extended Data Fig. 2). This signal is likely to represent 
between-population variation in a correlated trait, size at maturity 
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Figure 2 | Genetic architecture of age at maturity in the VGLL37op 
locus. a, Odds ratio (median) between the alternative homozygous 
genotypes for delaying maturation in females (n = 693, red) and males 
(n=711, blue). Error bars are 50% sampling quantiles (100,000 parametric 
permutations). All odds are significantly different from 1 (P< 0.001). 

b, Probability of delaying maturation as a function of VGLL37op genotype 
in females (red) and males (blue). Dominance estimates on the liability 
scale are given for each sex (see also Extended Data Fig. 5). Note that the 
2-3 year category in females did not deviate significantly from additivity 
on the observed scale. 


(Supplementary Information, Extended Data Fig. 2c-e). The core 
haploblock included a transcription factor of the hypothalamus- 
pituitary-gonadal axis, SIX6, associated with size and age at maturity 
in humans”! and a conserved non-coding element that aligns to a can- 
didate distal forebrain enhancer of SIX6** (Extended Data Fig. 4b-c 
and Supplementary information). Both genome regions also exhib- 
ited strong signals of spatially divergent selection across populations 
(FLK Fer outlier test, P< 10~'; Fig. 1b). 

Two alleles at the most highly associated SNP in the VGLL3 locus 
(VGLL3,7op) conferred either early (E) or late (L) maturation. LL indi- 
viduals had significantly higher odds ratios for delaying maturation, 
particularly for older maturity ages (Fig. 2a) and were predicted to 
mature, on average, 0.87 (females) and 0.86 (males) years later than 
EE individuals (Fig. 3a); a remarkable shift considering the average 
sea age at maturity in salmon is 1.6 years (population range averages 
1.0-2.6)!. This locus also influenced size of individuals with the same 
age at maturity in both sexes, with a genotype-by-maturity interac- 
tion in males: for example, length = 100 and 80cm, for LL and EE 
males maturing after 3 years at sea, respectively (P = 0.006; Fig. 3b 
and Supplementary Table 1). In addition, there were striking dif- 
ferences in dominance patterns between the sexes: in females the L 
allele was partly dominant across threshold categories (6 = 0.26 + 0.13, 
P=0.028), whereas in males the E allele was completely dominant 
(6=—0.96 0.17, P< 0.001; Figs 2b and 3a, Extended Data Fig. 5 and 
Extended Data Table 2), providing a compelling mechanism contrib- 
uting to the larger proportion of males exhibiting an early maturing 
phenotype compared with females'®. Variation at VGLL3 was main- 
tained in all but 1 of the 54 NOR populations, with all populations 
characterized by large salmon maintaining intermediate allele frequen- 
cies, consistent with balancing selection’ (Extended Data Table 1 and 
Extended Data Fig. 2c). Given that a large proportion of variation in 
age at maturity (and subsequent body size) is governed by a single 
locus of large effect, such sex-specific trade-offs with a shared genetic 
basis could effectively maintain genetic variation under varying pat- 
terns of dominance between the sexes®. 

The large effect sizes of the VGLL3 alleles are consistent with 
evolutionary theory, which predicts that beneficial alleles of 
intermediate-to-large effects are likely to be maintained under bal- 
ancing selection, particularly when their phenotypic and fitness 
effects differ between the sexes’. Evolution towards complex traits 
controlled by fewer loci with larger effects is also predicted where 
gene flow between environments with different trait optima results 
in balancing selection'®. We investigated whether spatially vary- 
ing selection on VGLL3, suggested by the FLK Fey outlier analysis 
(Fig. 1b), was consistent with selection towards local optima. We 
found a strong effect of a population’s average age at maturity on 
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Figure 3 | Effect of the VGLL37op genotype on age at maturity and size. 
a, Age at maturity (in years) of females (n = 693, red) and males (n=711, 
blue) in relation to VGLL37op genotype. Circle areas are proportional 

to sample size. Black dots indicate predicted average sea age using logit 
transformation model, and error bars are 50% sampling quantiles (10,000 
parametric permutations). b, VGLL37op genotypic effect on size within 
maturation age classes. The average length (in centimetres) of females 
(red) and males (blue) maturing after 1, 2 or 3 years are indicated by the 
lower, middle and upper three dots, respectively. Length (in centimetres) 
on the y axis is log scaled and corrected for population effects. Circle 
diameters are proportional to sample size, and lines indicate sample s.d. 


the integrated haplotype score (iHS), a measure of the amount of 
extended haplotype homozygosity around one allele of an SNP rel- 
ative to the alternative allele (slope = — 1.18 + 0.27 standard error 
per year, R?=0.302, P=7.6 x 10~*; Fig. 4, Extended Data Fig. 6 and 
Supplementary Information). In populations with an older aver- 
age age at maturity, there were relatively higher levels of extended 
homozygosity around the L allele compared with the E allele, while 
the pattern was the opposite in populations with younger average age 
at maturity (Fig. 4 and Extended Data Fig. 6). This result suggests 
a systematic shift in selection pressure for earlier/later maturation 
alleles coincident with the population's average age at maturity and is 
consistent with divergent selection among populations towards local 
optima (Supplementary Information), and an effect of gene flow on 
the observed genetic architecture’”. 

Our results reveal a major effect locus determining age at matu- 
rity in Atlantic salmon. The large effect of this locus is remarkable 
given that age at maturity is generally considered a classic polygenic 
trait?!. A shared gene controlling age at maturity between mammals 
and a teleost fish provides evidence for evolutionary conservation 
across large taxonomic distances for a life-history trait, as observed 
for morphological characters”’. Our results provide the first empiri- 
cal example of dominance reversal allowing greater optimization of 
phenotypes within each sex. Partial dominance of the higher fitness 
allele in each sex can result in a net effect of heterozygote superiority 
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Figure 4 | Relationship between population iHS score (46 populations, 
32 haplotypes per population) and average maturation age of each 
population for the VGLL3;ys5 locus. iHS = 0 (no haplotype length 
difference) is marked with a horizontal grey line. Positive iHS values 
indicate longer haplotype blocks, and therefore stronger selection around 
the E allele in a population relative to the L allele and vice versa for 
negative iHS values. 


across the sexes, and thus maintain stable polymorphisms’. In com- 
mon with many other species, Atlantic salmon lack heteromorphic 
sex chromosomes”, which precludes the use of the X chromosome to 
protect sexual conflict polymorphisms. Sex-dependent dominance 
removes the restrictive conditions on maintaining conflict alleles on 
autosomes, making sexually antagonistic polymorphism more likely 
to be maintained on autosomes than on the X chromosome*. In line 
with our results, restrictive conditions on the maintenance of variation 
by balancing selection suggest fewer, large effect loci will control traits 
under both sexual antagonism and spatially varying selection’!°. The 
discovery of a major locus affecting age at maturity will have a sub- 
stantial impact on population management of Atlantic salmon, where 
a decrease in the frequency of late maturation has been observed in 
many populations”’, and potentially other exploited species showing 
comparable shifts towards earlier maturation” if this architecture is 
similar in other species. 


Online Content Methods, along with any additional Extended Data display items and 
Source Data, are available in the online version of the paper; references unique to 
these sections appear only in the online paper. 
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METHODS 


Study design and study material. Norway data set (NOR). Individuals were sam- 
pled from populations spanning the Norwegian coast from the Skagerrak in the 
south (59° N) to the Barents Sea in the north (71° N). In total 54 populations were 
sampled (n= 941), including 11 populations within the Barents/White Sea phy- 
logeographic group*', the remainder belonging to the Atlantic phylogeographic 
group (Extended Data Table 1). Samples were initially filtered to remove any 
individuals with possible aquaculture escapee ancestry”. 

Tana data set (TAN). Two sub-populations, occurring in sympatry**** in the 
mainstem Tana River, were subjected to in-depth within-population sampling 
(n= 463). Scales were collected from salmon harvested by local fishermen using a 
variety of methods (nets, rod) from 2001 to 2003. In total, 326 and 137 individuals 
from each sub-population were included. 

Baltic Sea data set. The BAL data set included 114 individuals from the Tornio 
river (66-69° N, 19-25° E) (Extended Data Fig. 1, Extended Data Table 1). This 
population belongs to the phylogeographically distinct Baltic Sea lineage*!. Scales 
were collected from individuals harvested by trained anglers from 2003 to 2005. 
Storage and phenotypic measurements were as described for the TAN samples. 

In total, the study included 1,518 Atlantic salmon individuals after initial data 

filtering that removed low-quality samples and individuals with signs of aquacul- 
ture escapee ancestry (Extended Data Table 1). 
Phenotypic measurements. Length at capture (LEN) and weight at capture 
(WGT) were recorded during sampling. The sex (SEX) of most individuals was 
determined genetically in the NOR and TAN data sets*®, while phenotypic sex 
determination was used for a small subset of samples (15% and 0.3% for TAN 
and NOR data sets respectively). Similar to tree rings, scale growth in fishes is 
commonly used to infer individual growth and age***’. Growth, freshwater age 
(that is, age before sea migration, FW Age), and years spent at sea before first 
sexual maturation and spawning, referred to here as age at maturity (Mat Age), 
were inferred from scales using internationally agreed guidelines for Atlantic 
salmon scale reading**. Early life-history growth traits and size were assessed for 
their influence on age at maturity as it has been shown that freshwater growth 
may be negatively correlated with seawater growth’ and that freshwater size may 
be positively correlated with age at maturity’. Freshwater size (FWS), freshwater 
growth (FWG), as well as first year growth at sea (SWG) were derived from the 
scale data, and used as independent variables throughout the analyses. Freshwater 
size is the log radius of the scale from the scale centre to the end of the freshwater 
growth period, and growth at sea is the log radius of the scale from the end of the 
freshwater growth to the end of the first winter annulus at sea. Freshwater growth 
is dependent on freshwater size and negatively on freshwater age. The residuals 
of the linear regression between freshwater size and freshwater age were further 
corrected for freshwater age to obtain the freshwater growth metric. As expected, 
a model where freshwater size was nested within freshwater age explained 97% 
of freshwater growth (analysis of variance, P< 10~'*). Size at the end of first year 
at sea (SWS) was completely dependent on freshwater size and growth at sea, 
and therefore was not explored as an independent variable to avoid co-linearity. 
Genotyping and data filtering. SNP array details. A custom 220,000 SNP 
Affymetrix Axiom array was used to genotype samples according to the manufac- 
turer’s instructions with a GeneTitan genotyping platform (Affymetrix). The SNPs 
on this array were a subset of those included on the 930K XHD Ssal array (dbSNP 
accession numbers ss1867919552-ss1868858426), and were chosen for maxi- 
mum informativeness on the basis of their SNPolisher performance (SNPolisher, 
version 1.4, Affymetrix), minor allele frequency (maf) in aquaculture samples 
(maf > 0.05) and physical distribution. The ascertainment bias of this array for 
wild Norwegian salmon is expected to be low because of the recent founding of 
the aquaculture population from a large number (1 = 40) of Norwegian salmon 
populations*!. Within each population, the order of samples was randomized 
with respect to age at maturity, and the genotype calling was conducted by an 
automated pipeline without knowledge of age at maturity status of each individual. 
No statistical methods were used to predetermine sample size. 

Raw genotyping data were analysed using the Linux-based APT pipeline apply- 
ing best practice thresholds (contrasts quality control (DQC) threshold, 0.82; 
STEP1, 0.97). After the initial sample filtering, markers with low maf (< 0.01) 
and/or call rate (< 0.97) were filtered out using the check.marker function in the 
GenABEL package (version 1.8.8)" in the R environment (version 3.1.0)" for the 
GWAS. The same data parameters were used for the data set for phasing except 
that no maf threshold was set (see below). We did not perform a Hardy-Weinberg 
equilibrium test, since the data set contained individuals from multiple popula- 
tions. After the filtering steps, 208,704 SNPs in 29 linkage groups remained in the 
analysis. An additional filtering was performed separately for the BAL data set 
(maf < 0.01), resulting in 167,410 SNPs remaining in this data set for the GWA 
analysis. 
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Model selection and GWAS of maturation age. We performed a GWAS of age at 
maturity using an additive cumulative proportional odds model with the R package 
ordinal, where the age at maturity propensity of a genotype was evaluated using a 
logit link model, and a flexible threshold structure (Supplementary Information). 
In addition to analysing the NOR and TAN data sets separately, model selection 
was performed with the two data sets combined (NOR + TAN, n= 1,404) includ- 
ing geographical coordinates as parameters. Model selection was not performed 
for the BAL samples, for which we did not have freshwater phenotypic information 
available, and we performed GWA analysis without phenotypic co-variates (that 
is, BASIC model). Supplementary Table 2 lists the details of the model selection 
parameters for TAN, NOR and the combined (TAN + NOR) data sets. 

We performed all GWA analyses with the BASIC model in addition to the FULL 
model to assess the effect of inclusion of covariates on the model (for example 
ref. 45; see Extended Data Fig. 2). The GWA analysis used a model comparison 
approach, where the effect of the SNP loci was evaluated by comparing the likeli- 
hood of the observational level model (as above) to the additive genetic model with 
SNP loci as covariates. Genome-wide statistical significance was adjusted for multi- 
ple comparisons and genomic inflation (A) for each analysis (P= 0.05 x nSNP x A). 
Specific significance thresholds are listed in Extended Data Fig. 2. 

To account for population stratification, we fitted the same model as above but 
also included principal components derived from the genomic kinship matrix as 
fixed factors. Principle components were added sequentially in the model until 
origin of population no longer explained a significant portion of genetic vari- 
ance across SNPs (Supplementary Information). The optimal number of principal 
components was one for TAN, and 14 for NOR and the combined (TAN + NOR) 
data sets (Extended Data Fig. 2). Population structure in the BAL data set was 
corrected using two principal components as fixed factors, which reduced the 
A value to 1.07 (Extended Data Fig. 3). We also compared association statistics 
of BAL (n= 114) and the combined data (TAN + NOR, n= 1,404) post hoc, to 
assess the magnitude of the effect of sample size on the association statistic of 
the VGLL3,zop locus. The TAN + NOR data set was re-sampled 100,000 times 
with an equivalent sample size and age at maturity structure to the BAL data set. 
The observed association statistic for the VGLL37op locus in the BAL set was 
similar to that in the TAN + NOR re-sampled data sets (Kolmogorov-Smirnov 
test, P=0.51, Extended Data Fig. 3) indicating that the lower P value in BAL is 
probably caused by the lower sample size. 

Identifying signatures of spatially divergent selection: Fey outlier test. We used 
an extension of the Lewontin and Krakauer test, the FLK Fsr outlier test, which 
uses population trees (using Reynold’s genetic distances and neighbour joining 
algorithm) to estimate expected neutral evolution (null) among populations”. This 
method has been shown to perform well under different demographic scenarios*’. 
The empirical null distribution of SNPs was identified using the estimated popu- 
lation tree and 100,000 simulations. 

Mode of inheritance and effect sizes of age at maturity loci. We detailed the 
genetic architecture of the loci associated with age at maturity by evaluating the 
likelihood of several inheritance models. In addition to simple additive and dom- 
inance models, we also tested various models where dominance inheritance was 
modelled conditioned on sex. Extended Data Table 2 lists the details of each model. 
Models were compared using an information-theoretic approach, where the model 
with the lowest Akaike information criterion was accepted as the optimal model 
explaining the data. The coefficient of the optimal model for the VGLL37op locus 
is given in Supplementary Table 3. 

The patterns of dominance in the VGLL37op locus were investigated in detail, 
at the unobserved liability scale. Deviations from additivity were tested for each 
sex separately. In addition, deviation of genetic architectures between the sexes 
was also tested. For these tests, we used genotype coefficients (Bgenotype) and the 
standard errors obtained by the threshold model (Supplementary Table 3). We first 
standardized the coefficient to [0,1] range, such that (8;7+z¢)/2 =0.5 (that is, the 
average of the homozygote genotypes). Ten thousand parametric permutations 
were drawn from genotype coefficients, and the additive expectation (that is, null) 
was calculated as (8;,+zp)/2, which was compared with Bz, (heterozygote). For 
direct comparisons between sex dominance patterns, test statistics were reported 
as the proportion of samples deviating from the null (Hun: Betfemale = Betmale) in 
one direction. 

The proportion of variation explained by the VGLL370p locus. To estimate 
the proportion of variance in age at maturity explained by the VGLL37op gen- 
otype, we employed an alternative modelling framework, where the response 
variable (Mat Age) is expressed on the logit scale with the y= Mat age/(1 + Mat 
Age) transformation. Advantages of this transformation, where logit(Mat Age/ 
(1+Mat Age)) is equal to log(Mat Age) and thus the coefficients are on the (log) 
observational scale, are that it (1) conveniently allows quantification of the exam- 
ined variation in relation to total variation, and (2) enables quantification of effect 


© 2015 Macmillan Publishers Limited. All rights reserved 


LETTER 


size with a straightforward interpretation. We employed this framework to the 
NOR, TAN and combined data sets with the same specification as in the FULL 
model (including principal components as fixed factors) using glmer function in 
the Ime4 package (version 1.1-7)** in R, with a binomial link and bobyqa optim- 
izer. The model provided fits comparable to the additive cumulative proportional 
odds model, where, R?\yp and R?cs were 0.19 and 0.28 for the TAN, 0.13 and 
0.17 for NOR, and 0.03 and 0.12 for the combined data sets, respectively (see 
Supplementary Information and Supplementary Table 4 for model details). 

In addition to age at maturity, we also analysed the genotypic effect of 

VGLL3r7op locus on variation in size at return, using analysis of variance, and 
after accounting for maturation age. We modelled sexes separately because of 
the sex-dependent genetic architecture; the genotype-by-maturity age interaction 
term was included in the models. Population effects were calculated with a similar 
framework as in the GWA analysis such that the same number of principal com- 
ponents were used here to correct for the population inflation factor. 
Estimation of the coefficient of determination. McFadden’s pseudo-R? and 
Cox and Snell’s pseudo-R’, which are two commonly used metrics that gener- 
alize the OLS R? framework to logistic models, were used to evaluate the good- 
ness of fit of the model. The likelihood differences between the alternative (that 
is, with VGLL37op genotypes) and the null hypothesis were McFadden’s R? 
(Rr) = 1 — VogLat/logL nu), and Cox and Snell’s R? (R?c¢s) = 1 — (Lnunt/Lan)””", 
where n equals the number of observations. We used the optimum model explain- 
ing genetic architecture as Hy, (Extended Data Table 2), and excluded only gen- 
otype terms from the model for the Hyu. For TAN, R’p and R’cg were 0.17 and 
0.23. For NOR, R?yyp and R’cs were 0.10 and 0.15 and the equivalent values for the 
combined data set were 0.12, 0.17, respectively. 
Genome re-sequencing and functional variant detection. Thirty-two wild 
Atlantic salmon were selected for whole genome re-sequencing from seven popula- 
tions (three from the Barents/White Sea and four from the Atlantic phylogeograph- 
ical groups; see also Extended Data Table 1 and Supplementary Table 5). Three 
individuals per population were re-sequenced, except for the Tana sub-populations, 
where 14 individuals were re-sequenced. DNA was isolated from 14 adipose fin- 
clips (stored in ethanol) and 18 scale samples collected in 2012 and 2013 (stored 
in paper envelopes) using Qiagen DNAeasy kits according to the manufacturer's 
recommendations. DNA was quantified using Qubit fluorometry (Invitrogen). 

For high-quality DNA derived from adipose tissue and two scale-derived 
DNA extractions that had high DNA quantity and quality, sequencing libraries 
were produced using a TruSeq DNA PCR-free Library Preparation Kit. Libraries 
for the remaining 16 scale-derived DNA extracts were prepared using a TruSeq 
Nano DNA Library Preparation Kit. The main motivation for this difference was 
to select kits most suited to available sample quantities: both kits use mechan- 
ical fragmentation (Covaris), thus limiting a bias caused by using a mixture of 
enzymatic and mechanical approaches. Library preparations were performed 
according to the manufacturer's instructions (Supplementary Table 5). All 
libraries were subjected to a fragment size selection (mode = 350 base pairs) 
and sequenced to generate 2 x 125 nucleotide paired-end reads using an Illumina 
HiSeq 2500 platform. Sample preparation and sequencing were performed by 
the Norwegian Sequencing Centre, Ulleval (Oslo, Norway). Only reads pass- 
ing Ilumina’s chastity filter were used in subsequent analysis. We further used 
FastQC to assess sequencing quality, passing lanes where the per-base quality 
score box plot indicated bases 1-110 having > Q20 for >75% of the reads. All 
lanes passed the quality criteria. 

Reads were mapped to the salmon reference genome (National Center for 
Biotechnology Information Whole Genome Shotgun (NCBI WGS) accession 
number AGKD04000000) using BWA mem version 0.7.10-1789 (ref. 49). The 
thirty-two samples were sequenced to a depth of around 18x (8x to 32x). 
In total, 7.6 billion out of 8.3 billion reads (92%) were properly aligned to the 
genome. SNPs and short indels were identified using Freebayes (version 0.9.15-1 
(ref. 50)). To filter away low-quality variants, we used the run-time parameters 
-use-mapping-quality and -min-mapping-quality 1, in addition to ‘veffilter -f 
"QUAL > 20"* SNPs and short indels were annotated using snpEff version 4.0e 
(ref. 51). The snpEff annotation database was based on the CIGENE annotation 
version 2.0 (Lien et al., submitted). 

The potential effects of the missense mutations detected in VGLL3 and 
AKAP11 on protein function was assessed using the PolyPhen2 program”. 
Polyphen2 predicts the possible impact of amino-acid substitutions on the 
structure and function of proteins using physical and evolutionary comparative 
considerations. Owing to a lack of structural information available for these 
two genes, the assessment relied on evolutionary comparisons using multiple- 
sequence alignments. 

Identifying signatures of spatially divergent selection: iHS analysis. Genotypic 
data from all individuals (n= 1,518) were phased using Beagle 4.0 software? 


with imputation for missing genotypes (0.2% of calls) using a parameter window 
size =50,000 and overlap size =3,000 SNPs. 10, 40 and 50 iterations were parame- 
terized for burn-in, phasing and imputation of the data, respectively, and physical 
distances were used as a proxy for genetic distances. We used an extended haplo- 
type homozygosity (EHH)**-based test to detect footprints of selection, using the 
rehh package (version 3.1.1). We first computed integrated EHH scores (iHH) 
using the scan_ehh function in rehh version 3.1.3 with default parameters. We then 
computed the iHS” for each population separately using the ihh2ihs function in 
rehh version 3.1.3 (frequency bin = 0.05, maf =0.05). iHS is a metric to quantify 
the difference in EHH between the two alleles of a given SNP. The iHS statistic is 
standardized empirically to the distribution of observed iHS values over a range 
of SNPs with similar derived allele frequencies. The ancestral allele was initially 
randomly assigned for every SNP to have an even distribution of SNPs in each 
frequency category for standardization (Supplementary Information). The L allele 
of the VGLL3;s SNP was assigned derived status in each population. Therefore, 
higher levels of extended homozygosity around the L allele compared with the E 
allele within a population are indicated by negative iHS values, and higher levels of 
extended homozygosity around the E allele compared with the L allele are indicated 
by positive iHS values. 

We tested whether variation in age at maturity among populations could be 
maintained by selection towards an optimum age at maturity composition within 
each population, given gene flow is expected among the phenotypically divergent 
local populations sampled. Balancing selection is expected to leave similar pat- 
terns in the genome as recent positive selection, making it difficult to distinguish 
using haplotype-based methods such as iHS*”. Additionally, such haplotype meth- 
ods have reduced power to detect selection from standing genetic variation, as 
may arise from balancing selection*’, such as spatially varying selection. However, 
haplotype patterns are expected to change even when selection acts on multiple 
haplotypes°’, and haplotype-based methods, such as iHS, retain some power to 
detect selection so long as selective sweeps are not too soft: that is, they do not 
contain too many different haplotypes”. To test for divergent local selection, we 
employed a linear model where the iHS values were regressed over the average 
sea age of the populations (Extended Data Table 1). Statistical significance was 
assessed by comparing the regression coefficient (that is, the proportion of var- 
iation in iHS explained by age at maturity) at the locus of interest with the null 
distribution at the genome-wide level. For this analysis, we calculated the iHS 
statistic for every population with at least 16 successfully genotyped individuals 
(32 haplotypes from 46 populations; see also Extended Data Table 1), and used an 
equal number of individuals per population (by random selection of individuals 
ifn > 16). We assessed the effect of using 16 randomly selected individuals from 
the two populations in the TAN data set, and found that iHS values in the reduced 
set were in good agreement with the full data set (Pearson's r= 0.72 and 0.75 for 
younger and older age-structured sub-populations, respectively; P< 10~'® for 
both data sets), suggesting the robustness of the iHS analysis with the sample size 
used (Extended Data Fig. 6c, f). 
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Extended Data Figure 1 | Map of study populations. Bars indicate the from European Environmental Agency (under a Creative Commons 
proportion of individuals maturing after 1 (light blue), 2 (medium blue) Attribution 4 License) and the Norwegian Water Resources and Energy 


or >3 years (dark blue) at sea; 1-54, NOR data set; 55-56, TAN; 57, BAL Directorate. 
(Extended Data Table 1). Data for lake and river coordinates were obtained 
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Extended Data Figure 2 | GWAS analyses for the TAN (n = 463), 


2.0 


2.5 


NOR (n= 941) and combined (n= 1,404) data sets. a, Manhattan and 
quantile—quantile plots of the GWAS for age at maturity in Atlantic salmon 


before (left) and after (right) correction for population structure. T 


he 


first three rows are models including phenotypic covariates (that is, the 
FULL model), and the next three rows are models without phenotypic 
covariates (that is, the BASIC model). The y axis shows the association 
statistic (—log)(P values)) for each SNP ordered by chromosome and 
position (x axis). The genome-wide statistical significance adjusted for 
multiple comparisons and genomic inflation is indicated by a horizontal 
dashed line. The VGLL3 op (the SNP with the highest association with 
age at maturity) and VGLL3 74g (the SNP strongest linkage disequilibrium 


3.0 


1.0 
Populations' average age mat. 


15 20 25 3.0 


0.0 02 04 06 08 
Allele frequency (L), SIX6,,,, 


1.0 


with the missense mutations in the VGLL3 gene) SNPs are shown with red 
arrows. QQ plots showing the deviation of P values (red line) from the null 
expectation (black line) are in the insets. b, Proportion of SNPs showing 
no evidence of significant population structure (Hpuw: Akaike information 
criterion<—2) as a function of the number of principal components 
included in the model, for TAN (squares), NOR (circles) and the combined 
data set (TAN + NOR; triangles). The numbers of principal components 
used in population corrected models are marked with red. c, Relationship 
between population average age at maturity and allele frequency at 

the VGLL37op SNP and (d) SIX67op SNP. e, Relationship between the 
VGLL3rop SNP and the SIX67op SNP allele frequencies. 
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Extended Data Figure 3 | GWAS analyses for the BAL data set. The QQ plot shows the deviation of P values (red line) from the null 
Manhattan plots and quantile-quantile plots of the GWAS for age at expectation (black line). c, Distribution of association statistics for the 
maturity in the BAL data set (n = 114), (a) before and (b) after correction VGLL3 7p SNP in 100,000 bootstrapped replicates with resampling, using 
for population structure. The y axis shows the association statistic the TAN + NOR data set combined (n = 1,404). An equivalent sampling 
(—logio(P values)) for each SNP ordered by chromosome and position design to the BAL data set (n = 114 and the same age at maturity structure; 
(x axis). The genome-wide statistical significance adjusted for multiple see Supplementary Table 1) was used in the resampling. The red arrow 
comparisons and genomic inflation is indicated by a horizontal dashed indicates the P value of the VGLL37op SNP in the BAL data set. 


line. The VGLL37op and VGLL374c SNPs are shown with red arrows. 
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Extended Data Figure 4 | Gene model diagrams detailing regions 
around the VGLL37op and SIX67op loci. a, Gene models and genomic 
positions of the two genes in the genome region on chromosome 25 
significantly associated with age at maturity. Missense SNPs identified 
by re-sequencing within the genes are indicated in green. Amino acids 
indicated above and below the gene model were associated with the 

late (L) and early (E) maturation alleles, respectively. Longer tick marks 
show custom 220K Affymetrix axiom array SNPs, and shorter tick marks 
indicate re-sequencing variants. Notable SNPs are colour coded with 

red (VGLL3 op), blue (VGLL3;s) and green (the SNP tagging missense 
mutations in VGLL3 and the AKAP11 missense SNP). Note that missense 
variants on VGLL3 were identified by whole genome sequencing. The 
array SNP in tightest linkage disequilibrium with the VGLL3 missense 
variants identified by re-sequencing is 306 and 2,356 base pairs upstream 


(R?=1 and 0.71, respectively). b, Gene model and linkage disequilibrium 
plots of an ~0.5 Mb region on chromosome 9 where a significant GWAS 
signal was observed before correction for population structure. The 
association plot shown is before correction for population structure, using 
the combined data set (TAN + NOR). The SIX670p locus is shown in 

red. Shorter tick marks in the SNP axis indicate re-sequencing variants. 
Fsr estimates for SNPs in the region are also shown (lower graph). 

Closed circles indicate SNPs significantly diverged from null (neutral) 
expectations (FLK Fey outlier test, 99.5% quantile of the null distribution, 
(56 populations, total n = 1,404). c, Conserved elements (PhastCons) of 
the 200 kb region around the SIX6 gene showing the predicted forebrain 
distal regulatory element (red tick mark) that is located close to the 
SIX6rop SNP. One re-sequenced variant in strong linkage disequilibrium 
with the SIX6rop SNP was located in this region. 
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Extended Data Figure 5 | Details of modelling the genetic architecture 
of age at maturity. a, Threshold logistic models explaining variation in age 
at maturity in relation to the VGLL370p SNP in the TAN (n= 220 females, 
243 males), NOR (n = 473 females, 468 males) and the combined (n = 693 
females, 711 males) data sets for females (left panels) and males (right 
panels). Shaded grey areas around the logistic curves indicate one 
standard error of the threshold coefficients, and shaded red and blue 

areas indicate one standard error around genotype coefficients for females 
and males, respectively. The y axis depicts the probability of delaying 
maturation from one maturity age class to the next. LL genotypes were 
centred to zero (intercept) and had no standard error because of the rank 
deficiency of the model (that is, threshold degrees of freedom is prioritized 
in the model). Threshold coefficients are sex independent, which was 

the optimal model explaining the data (see Extended Data Table 2 and 
Supplementary information 3). Small insets to the right of each logistic 
curve depict the odds of delaying maturation for the LL genotype in 
relation to the EE genotype (median, 50% parametric sampling 


quantile) and the degree of partial dominance (median, 50% parametric 
sampling quantile) on the unobserved liability scale (that is, the 

x axis in the logistic curves). The dominance estimates (5) given above 
each panel are scaled to [— 1,1] range (6 = (2Ber + (Bit — Bee))/(|Bi1— Beel)), 
where negative and positive values indicate an EE-like, and LL-like, 
expression of the phenotype (that is, delayed maturation), respectively. 

P values in the upper insets show the significance of the model 

deviating from additivity (Pag, 10,000 parametric permutations). 

The difference in dominance between females and males is highly 
significant for all data sets (P= 0.0082 for TAN, and P< 0.001 

for NOR and the combined data sets.). P values for all odds of 

delaying maturation are significant (P < 0.001, 100,000 parametric 
permutations). b, Predicted mean and 50% sampling quantiles (10,000 
parametric permutations) of age at maturity using the logit transformation 
model. The y axis is log scaled. P,aq values in the insets shows 

significance of the model deviating from additivity (10,000 parametric 
permutations). 
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Extended Data Figure 6 | Haplotype length analysis summary. 

a, Manhattan plot of each SNP in the study showing the P values of the 
correlation between population iHS values (46 populations, 32 haplotypes 
per population) and the average age at maturity. Ten SNPs flanking the 
VGLL3zop and SIX67op SNPs are marked with red circles and triangles, 
respectively. b, c, Same as a but showing a 5 Mb magnified view of the 
(b) VGLL3 and (c) SIX6 regions. d, Histogram showing the statistic 
distribution of the association between iHS and average age at maturity 
for all SNPs analysed in the study. Ten SNPs around the VGLL37op and 
SIX6rop SNPs are marked with blue and red arrows, respectively, 
where longer arrow tails show the VGLL37op and SIX67op SNPs. 

e, f, iHS concordance (Pearson’s r) in the TAN data set between the 


reduced (n = 16) and full data sets for (e) a sub-population (55) with 
lower average age at maturity (nm = 137) and (f) a sub-population (56) 

with higher average age at maturity (n = 326). Each point shows a single 
SNP. The lower panel shows the concordance (Pearson's r) of the TAN full 
data sets to all populations (n = 46) included in the iHS analysis. The self- 
concordance, as in the upper panel, is indicated with red. g, Relationship 
between population iHS score and VGLL37op allele frequency. iHS = 0 (no 
haplotype length difference) is marked with a horizontal grey line. Positive 
iHS values indicate longer haplotype blocks, and therefore stronger 
selection, around the E allele in a population relative to the L allele, and 
vice versa for negative iHS values. 
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Extended Data Table 1 | Geographic and life-history details of Atlantic salmon populations included in this study, with sample sizes and 
genetic data of key SNPs for each population 


P Ri Population details Summary of study samples 
eee recs River name ,; Coordinates .5 % Mat Proportion mat age(%) L(cm, f Mat Year Ho iHS* Allele freq. (L) 
ID’ code Phylot Se M A 
Lat(N) Lon(E) Maid" age ‘yr. 2yr. 3yr. >3 yr. mean) age” (20xx) VGLL3 VGLL34s SIX6 VGLL3 
NOR 


1 001.1Z  Enningdalselva ATL 58.98 11.47 824 92.4 230 144 426 423 08 78.7 16(5) 1.94 12 056 -0.242 0.75 0.47 


2  015.Z  Numedalslagen ATL 59.03 10.06 1483 95.5 2.00 18.7 63.9 16.3 1.1 77.7 16(7) 1.81 12 069 -1.237 0.88 0.66 
3 016.2 Skienselva ATL 59.13 9.63 1024 91.6 1.73 346 573 78 02 69.2 17(12) 182 12 035 -0.523 0.68 0.29 
4 033.2 Ardalselva ATL 59.14 6.17 1102 94.7 2.21 15.2 48.9 344 16 78.0 18(10) 2.00 11 0.56 -1.697 0.58 0.50 
5 035.3Z Vorma ATL 59.27 6.33 2534 98.0 1.90 243 60.2 148 06 70.6 18(9) 1.94 11 0.06 -0.078 061 0.14 
6 036.2 Suldalslagen ATL 59.48 6.25 2070 98.3 2.11 220 464 289 2.7 785 17(9) 1.82 11 0.53 -0.585 0.56 0.32 
7 038.2 Vikedalselva ATL 59.49 5.90 193 94.3 1.95 21.3 626 161 0 72.0 14(6) 1.86 12 0.79 0.79 0.61 
8 041.2 Etneelva ATL 59.67 5.93 1194 91.5 2.14 15.2 57.0 264 14 76.3 17(7) 2.24 12 0.29 -0.495 068 0.44 
9 050.Z  Eidffordvassdraget ATL 60.47 7.07 23 100 283 53 53 842 53 98.3 21(15) 2.86 11-12 0.29 -1.093 0.95 0.43 
10 055.7Z Oselva in Os ATL 60.18 5.47 1029 99.1 1.82 30.2 569 126 03 68.9 17(9) 1.88 11 0.29 -0.585 0.24 0.26 
11 060.4Z Loneelva ATL 60.53 5.49 428 91.6 1.41 616 36.2 2.2 0 60.1 16(4) 1.56 12 0.25 0.861 047 0.19 
12 073.2 Leerdalselva ATL 61.10 7.48 149 97.3 2.56 1.5 434 50.0 5.1 92.0 17(15) 306 7 0.53 0.529 0.97 0.68 
13 O77.Z Arayelva ATL 61.27 7.17 180 100 2.26 13.7 464 38.1 1.8 82.7 16(5) 2.19 12 0.44 -1.630 0.91 0.53 
14 079.2 H baeava ATL 61.22 6.07 407 90.9 2.08 164 60.7 221 08 75.0 13(7) 2.15 12 0.46 0.46 0.46 
Igyangervassdraget 


15 082.5Z DalselvainDale ATL 61.36 540 66 100 1.76 26.2 72.3 1.5 
16 082.2 Flekkeelva ATL 61.31 5.35 1816 97.5 2.22 17.1 43.8 37.7 


63.8 17(5) 1.24 13 0.12 0.198 0.38 0.06 
76.3 20(14) 2.35 11 0.45 -1.043 0.68 0.73 


25 105.2 OselWvainMolde ATL 62.79 7.72 502 95.0 1.35 66.2 32.5 1.3 
26 107.3Z SylteelvainFreena ATL 62.84 7.21 820 96.0 1.32 68.9 30.3 0.8 


57.2 19(9) 1.47 12 0.16 1.098 0.34 0.08 
56.0 18(7) 150 12 0.17 1.937 0.33 0.08 


0 
14 
17 084.7Z Nausta ATL 61.51 5.72 498 886 1.85 25.9 60.8 131 03 683 11(7) 1.82 12 0.18 0.27 0.09 
18 084.2 Jalstra ATL 61.46 5.83 133 97.7 1.91 17.6 731 92 0 732 19(11) 1.89 13 0.42 -0.553 0.63 0.37 
19 087.1Z Ryggelva ATL 61.78 6.13 253 100 1.93 24.3 53.9 218 0 735 189) 189 11 0.56 -0.664 0.72 0.50 
20 087.2  Gloppeneva = ATL. «61.77 6.20 1215 97.1 2.15 16.1 52.2 298 1.9 77.1 26(16) 262 11 0.50 -2.499 0.75 0.75 
21 089.4Z Hialma ATL 61.91 5.85 206 985 1.84 28.7 587 126 0 677 7(2) 2.00 11 0.43 0.71 0.21 
22 102.6Z Tressa ATL 62.52 7.13 156 94.2 1.81 34.5 489 165 0 654 18(10) 144 12 0.39 0.402 0.28 0.19 
23 103.1Z Mana ATL 62.54 7.44 166 95.8 1.89 283 56.5 152 0 698 14(10) 1.79 12 0.50 0.61 0.32 
24 104.2 Eira ATL 62.68 8.12 1141 95.8 1.95 286 49.1 211 1.2 73.8 18(10) 2.39 12 0.39 -0.591 0.81 0.47 
0 
0 


2f 11072 Saya ATL 62.89 854 68 85.3 1.79 29.8 63.2 7.0 QO 663 17(11) 1.82 12 0.29 0414 044 0.21 
28 111.Z  Todalselva(Toa@a) ATL 62.82 8.70 98 96.9 1.88 26.7 589 13.3 1.1 722 17(10) 2.00 12 0.24 1.000 0.71 0.24 
29° 112.2 Suma ATL 62.97 8.67 1198 95.2 2.26 20.1 35.2 42.2 25 79.0 20/8) 1.9 13 040 -0.343 0.73 0.40 
30 122.22 Vigda ATL 63.31 10.18 83 928 1.25 74.7 25.3 0 QO 51.2 17(15) 1.24 10 0.12 NA 0.12 0.06 
31. 122.2 . ATL 63.34 10.24 1218 91.9 2.38 15.0 345 480 25 83.7 24(14) 267 13 0.50 -0.714 0.77 0.46 
32 123.4Z Homla ATL 63.41 10.80 113 94.7 1.29 72.9 25.2 1.9 0 562 1811) 1.39 11 0.22 0.665 0.31 0.11 
33 138.52 Aursunda ATL 64.37 11.39 124 91.9 1.12 912 53 3.5 QO 47.8 1811) 1.00 11 0.22 0427 0.17 0.11 
34 138.2 Argardsvassdraget ATL 64.31 11.22 1335 94.0 1.23 77.5 22.3 0.2 0 53.4 13(7) 1.54 12 0.00 0.27 0.00 
35 139.2 Namsen ATL 64.46 11.52 1308 93.3 1.94 37.5 316 29.9 1.0 71.9 16(4) 163 12 063 0.924 063 0.38 
36 160.43Z Reipaga ATL 66.91 13.63 38 868 1.12 87.1 12.9 0 QO 652.5 19(8) 1.42 11-13 0.21 1.192 0.37 0.16 
37 161.Z Beiarvassdraget ATL 67.03 14.58 1561 92.1 2.09 26.0 386 344 1.0 77.0 15(9) 2.07 12 0.60 0.77 ~=0.30 
38 163.Z Saltdalsvassdraget ATL 67.10 15.42 983 946 1.87 35.5 426 210 09 74.2 7(3)) 2.71 12 0.29 0.79 0.43 
39 172.Z  Forsavassdraget ATL 68.27 16.63 117 87.2 1.55 49.0 47.1 3.9 63.3 17(10) 1.47 12 0.41 1.044 0.29 0.21 
40 174.52 aris ATL 68.55 17.56 40 92.5 2.11 324 26.5 41.2 76.8 16(7) 1.75 11-12 0.31 -0.236 0.56 0.47 


41 186.22 Roksdalsvassdraget ATL 69.05 15.87 753 94.3 1.24 76.7 227 06 
42 194.2 Laukhelevassdraget ATL 69.23 17.86 138 93.5 1.37 67.7 258 65 
43 196.2 MalseWwassdraget BW 69.27 18.51 591 944 215 315 235 439 10 786 17(5) 147 11-12 047 -0.656 082 041 
44 202.11Z Skipsfordvassdraget ATL 70.16 19.80 137 956 146 558 41.7 25 57.2 19(7) 132 12 0.11 0698 0.29 0.05 
45 212.2 Altaelva BW 69.97 23.37 2047 97.3 200 46.7 106 383 44 763 205) 200 12 055 -0.233 098 053 
46 213.2  Repparforeva BW 70.45 24.32 3932 97.5 1.36 73.1 181 85 03 612 17(6) 194 12 053 0.237 0.50 0.38 
47 224.7 LakselvainPorsanger BW 70.08 24.92 361 945 226 299 153 481 67 846 16(7) 206 12 0.50 -1.153 0.91 0.38 
48 225Z BorselvainPorsanger BW 70.31 25.52 287 955 134 703 253 40 04 615 17(5) 1.71 11 041 0.241 044 0.44 
49 231.72 ae ail BW 71.05 28.05 149 940 121 775 225 0 O 579 17(9) 129 12 041 -0685 0.32 0.21 
50 231.8Z Risfordvassdraget BW 70.98 28.17 46 935 140 634 366 0 O 586 17(5) 141 11 065 -1276 0.56 0.32 
51 234.2  Maskejohka BW 70.28 2815 327 982 143 703 174 80 24 748 30(15) 1.90 0310 050 -1.052 067 052 
52 2342 ——_Laksjohka BW 70.06 27.55 126 100 120 849 135 0 16 574 21(12) 152 03-10 0.10 0.721 0.00 0.05 
53 2392  Komageva § BW 70.24 30.52 1029 936 1.51 574 327 99 0 636 17(11) 1.76 12 0.06 -0.385 068 0.03 
54 240.2 VestreJakobsev BW 70.11 29.33 2038 97.1 1.63 499 373 124 04 683 23(10) 165 13 043 0598 0.52 0.30 
TAN 

5556" 234.2  Tanamainstem BW 70.47 28.25 86317 93.7 1.60 61.5 18.1 19.1 1.3 72.0  463(220) 2.15 01-03 


55.2 19(11) 1.37 12 0.11 NA 0.11 0.05 
58.9 19(7) 1.53 12 0.42 0.785 0.45 0.26 


55 1.25 47.6 10.5 2.2 0 137(71) 1.36 044 0.282 0.28 0.33 
56 2.14 13.9 76 169 1.3 326(149) 2.49 0.39 -0.796 0.96 0.75 
BAL 

57 Tomio BAL 65.81 24.16 4822 93.8 2.12 164 56.2 269 06 834  114(12) 1.94 05-08 0.50 0.80 0.60 


*The unique population ID used in this study (see also Extended Data Fig. 1). 

tUnique code for Norwegian rivers. 

4Phylogeographic lineage as in ref. 31. ATL, Atlantic; BW, Barents/White; BAL, Baltic. 

§The total number of samples used to infer population age-structure data. Years of collection were 2006-2014 for all populations except 1971-2009 for 55-56, 2000-2013 for 57, 1977-2009 for 51, 
1985-2009 for 52, 2006-2012 for 9, 2006-2007 for 12, 2006-2013 for 15. 

| | Percentage of maiden fish (first time spawners) in the population data. 

qNumber of individuals analysed with the SNP array (number of females in parentheses). Number of individuals genome sequenced was n=3 for populations 17, 18, 34, 35, 45, 46, and n=14 for 55 
and 56. 

#The average sea age at maturity of individuals analysed in the GWAS. 

+r Population-specific iHS. iHS was not calculated for some populations because of either low sample size (blank) or low maf (<0.05; NA). 

** Frequency of the S/X67op allele associated with populations with older age at maturity. 

ttFrequency of the VGLL37op allele associated with older age at maturity. 

++Populations 55 and 56 coexist sympatrically in the main stem Tana River and have younger and older age structures, respectively*’, Therefore, sea age proportions of these sub-populations are not 
directly available, and were extrapolated using weighted proportions of sea age classes assigned to each sub-population in ref. 33. 
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Extended Data Table 2 | Quality of various genetic architecture models explaining sea age at maturity at the VGLL37op locus 


model 
number 
1 


2 


3 


model name 


Additive 
Dominance 
type | 
Dominance 
type II 
Dominance - 
sex type | 
Dominance - 
sex type II 
Additive- 
dominance 
Additive- 
dominance with 
sex, type | 
Additive- 
dominance with 
sex, type Il 


model description 


Allelic effects are linear: e.g. pp, pq, gq coded as 0,1,2 
Dominance model to first allele: e.g. pp, pg, qq coded 
as 0,0,2 
Dominance model to second allele: e.g. pp, pq, qq 
coded as 0,2,2 
Sex specific full dominance model type I: e.g. pp, pq, 
qq coded as 0,2,2 for males, and 0,0,2 for females. 
Sex specific full dominance model type Il: e.g. pp, pq, 
qq coded as 0,0,2 for males, and 0,2,2 for females. 
Partial dominance is modelled by a genotype model 
(i.e. all genotypes modelled independently). 


Sex specific partial dominance is modelled by coding 
genotypes independently and interacting with sex. 


Sex dependent partial dominance is modelled by 
coding genotypes independently and sex interaction, 


plus sea age threshold levels sex specifically modelled. 


AIC 
TAN 
604.95 


674.45 


618.52 


602.20 


683.72 


604.69 


594.81 


582.37 


AAIC 
TAN 
22.58 


92.07 


36.14 


19.83 


101.35 


22.32 


12.43 


0.00 


AIC 
NOR 
1215.50 


1268.42 


1223.93 


1200.37 


1289.16 


1204.57 


1171.84 


1152.97 


AAIC 
NOR 
62.53 


115.45 


70.97 


47.40 


136.20 


51.60 


18.88 


0.00 


AIC 


combined 


1814.01 
1916.47 


1824.46 


1792.90 


1941.96 


1796.08 


1755.59 


1717.58 


The FULL model (that is, with phenotypic covariates) with population structure was employed to TAN (n=463), NOR (n=941) and the combined (n= 1,404) data sets. 
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Repairing oxidized proteins in the bacterial 
envelope using respiratory chain electrons 


Alexandra Gennaris!***, Benjamin Ezraty**, Camille Henry’, Rym Agrebi!*, Alexandra Vergnes*, Emmanuel Oheix°, 
Julia Bos*+, Pauline Leverrier!?*, Leon Espinosa‘, Joanna Szewczyk!?, Didier Vertommen’, Olga Iranzo°, 


Jean-Francois Collet>*? & Frédéric Barras* 


The reactive species of oxygen and chlorine damage cellular 
components, potentially leading to cell death. In proteins, 
the sulfur-containing amino acid methionine is converted to 
methionine sulfoxide, which can cause a loss of biological activity. 
To rescue proteins with methionine sulfoxide residues, living 
cells express methionine sulfoxide reductases (Msrs) in most 
subcellular compartments, including the cytosol, mitochondria and 
chloroplasts!~?. Here we report the identification of an enzymatic 
system, MsrPQ, repairing proteins containing methionine sulfoxide 
in the bacterial cell envelope, a compartment particularly exposed 
to the reactive species of oxygen and chlorine generated by the 
host defence mechanisms. MsrP, a molybdo-enzyme, and MsrQ, a 
haem-binding membrane protein, are widely conserved throughout 
Gram-negative bacteria, including major human pathogens. MsrPQ 
synthesis is induced by hypochlorous acid, a powerful antimicrobial 
released by neutrophils. Consistently, MsrPQ is essential for the 
maintenance of envelope integrity under bleach stress, rescuing 
a wide series of structurally unrelated periplasmic proteins 
from methionine oxidation, including the primary periplasmic 
chaperone SurA. For this activity, MsrPQ uses electrons from the 
respiratory chain, which represents a novel mechanism to import 
reducing equivalents into the bacterial cell envelope. A remarkable 
feature of MsrPQ is its capacity to reduce both rectus (R-) and 
sinister (S-) diastereoisomers of methionine sulfoxide, making this 
oxidoreductase complex functionally different from previously 
identified Msrs. The discovery that a large class of bacteria contain 
a single, non-stereospecific enzymatic complex fully protecting 
methionine residues from oxidation should prompt a search for 
similar systems in eukaryotic subcellular oxidizing compartments, 
including the endoplasmic reticulum. 

The fact that no Msr had been identified in the cell envelope 
of important human pathogens, including Escherichia coli and 
Pseudomonas aeruginosa, was surprising as this compartment is par- 
ticularly exposed to the oxidizing compounds present in the envi- 
ronment. We postulated that such a methionine sulfoxide (Met-O) 
reducing system had remained unidentified, and applied a genetic 
approach to uncover it, using E. coli as a model. We first constructed 
an E. coli Met auxotroph mutant lacking all cytoplasmic Msrs and found 
this strain (JB590) to be unable to use Met-O as the only Met source 
(Fig. 1a). We then searched for suppressor mutations conferring Met-O 
reducing capacity to JB590, which led to the isolation of strain BE100 
(Fig. 1a). Genetic analysis of the suppressor revealed the presence of 
an insertion sequence element (IS2) within yedV, a gene coding for the 
histidine kinase of the uncharacterized YedV/YedW two-component 
system’. In close vicinity were two genes, yedY and yedZ, encoding, 
respectively, a periplasmic molybdopterin-containing oxidoreductase 


and its putative membrane redox partner™®. YedY had been shown 
to reduce a variety of substrates in vitro, including trimethylamine 
N-oxide, and dimethyl, methionine and tetramethylene sulfoxides’. 
However, its physiological function had remained elusive, although a 
recent study in Azospira suillum suggested the homologous protein to 
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Figure 1 | The MsrPQ system reduces free Met-O and is induced by 
HOCL. a, JB590, a methionine auxotroph (Met™, numbered ‘1’), lacking 
all cytoplasmic Msrs (Met™ Msr-, ‘2’), cannot grow on Met-O as the only 
Met source in contrast to suppressor BE100 (Met~ Msr” SupM**©", <3’). 
Deletion of yedY (renamed msrP, ‘7’), moeA (‘6’) and tatC (‘5’), but 

not of trxA (‘4’), prevents the growth of BE100. b, The yedYZ operon 

is upregulated in BE100. The increase is observed both at the mRNA 
(quantitative PCR with reverse transcription (RT-qPCR), top; error bars, 
mean + s.e.m.; 1 =3) and protein (western blot, bottom) levels. The 
RT-qPCR primers were designed to quantify the yedY—yedZ mRNA. 

c, Immunoblot analysis showing that HOCI (2 mM), but not H2O2 
(1mM), induces YedY synthesis in a wild-type strain. Images in a-c are 
representative of experiments made in biological triplicate. Uncropped 
blots are in Supplementary Fig. 1. 
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Figure 2 | MsrP non-stereospecifically reduces protein-bound Met-O. 
a, Oxidation of Met in calmodulin (CaM) by HO; leads to a mobility shift 
of the oxidized protein (CaMox); compare lane 2 with lane 1. Incubation 
of CaMox with MsrP and a reducing system involving dithionite and 
benzyl viologen restores the mobility (lane 3). b, MsrP can reduce Met-O 
in CaMox. The oxidation state of peptides containing either Met36, 
Met109, Met124 or Met145-146 was determined by LC-MS/MS. Error 
bars, mean +s.e.m.; n= 4 for Met36, 109 and 145-146; n=5 for Met124. 
Met-O residues were detected in the untreated and MsrP-treated samples 
owing to limitations inherent to the methodology applied and oxidation 
of the samples during analytical handling. c, Representation of the two 
diastereoisomers of Met-O, R-Met-O and S-Met-O. d, MsrP exhibits 
activity towards both diastereoisomers (left), contrary to the stereospecific 
enzymes MsrA and MsrB (right). Specific activities were assayed using 
64mM of either R- or S-Met-O. Error bars, mean +s.d.; n = 3. e, The 
suppressor BE100 is able to grow on both isoforms of Met-O. Images in 

a and e are representative of experiments made in biological triplicate. 
The uncropped gel is in Supplementary Fig. 2. 


be important for hypochlorous acid (HOCI) resistance’. We found that 
insertion of the IS2 led to a 100-fold increase in the levels of the yedYZ 
messenger RNA (mRNA) in strain BE100 and to higher YedY pro- 
tein levels (Fig. 1b). Deletion of either yedY or yedZ prevented BE100 
from growing on Met-O (Fig. la and Extended Data Table 1), while 
the simultaneous overproduction of YedY and YedZ, but not of YedY 
or YedZ alone, rendered the parental strain JB590 able to use Met-O 
(Extended Data Table 1). Altogether, these results indicated that the 
ability of the suppressor strain BE100 to reduce Met-O resulted from 
the increased synthesis of YedY and YedZ, implying that these two 
proteins function together as an Msr system. Growth of the BE100 
strain was dependent on moeA, a gene required for the synthesis of 
molybdopterin cofactors, and on tatC, encoding a protein required 
for the translocation of metalloenzymes across the inner membrane 
(Fig. 1a). Exposure of wild-type cells to HOCI, but not to H2O., induced 
the synthesis of YedY to levels comparable to those observed in BE100 
(Fig. 1c), indicating that these proteins are specifically expressed in 
response to bleach stress. Interestingly, induction by HOCI was depend- 
ent on the presence of a functional YedV/YedW system (Extended 
Data Fig. 1). 

All previously identified Msrs rely on electrons derived from 
NADPH via the thioredoxin (Trx) system for activity!. This was not 
the case for Yed YZ, as deletion of trxA, encoding the Trx responsi- 
ble for Msr recycling’, had no effect on the ability of BE100 to reduce 
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Met-O (Fig. 1a). As YedZ contains a b-type haem’, a cofactor typically 
associated with the quinone-oxidizing cytochrome b of the respiratory 
chain complexes, we considered the respiratory chain as a potential 
electron source. Deletion of menA and ubiE, two genes required for 
quinone synthesis, prevented BE100 from using Met-O (Extended Data 
Table 1), supporting a model in which YedZ uses electrons derived 
from the inner membrane pool of mature quinones to provide reduc- 
ing equivalents to YedY (Extended Data Fig. 2). From now on, YedY 
and YedZ will be referred to as MsrP (for periplasm) and MsrQ 
(for quinone), respectively. 

We then tested whether MsrPQ, in addition to free Met-O, also 
rescued Met-O residues present in proteins. Purified MsrP was shown 
to reduce N-acetyl-Met-O, a substrate mimicking protein-bound 
Met-O (Extended Data Fig. 3a), with a Michaelis constant (Km) of 
3.8 + 1.2 mM, in line with the values reported for other Msrs!°. Note 
that in the experiments involving purified MsrP, electrons were pro- 
vided to the oxidoreductase by an inorganic system reducing molybdo- 
enzymes’'. Next, we tested the ability of MsrP to reduce oxidized 
calmodulin (CaMox), a substrate commonly used to assess Msr activ- 
ity. We used a gel shift assay based on the reduced mobility exhibited 
in SDS-polyacrylamide gel electrophoresis (SDS-PAGE) by proteins 
containing Met-O'’. Incubation of CaMox with MsrP restored its 
mobility, suggesting that MsrP was able to reduce Met-O residues in 
CaMox (Fig. 2a). This was confirmed by showing with liquid chroma- 
tography-tandem mass spectrometry (LC-MS/MS) that the oxidized 
Met residues that could be detected in CaMox were reduced back to 
levels similar to those observed in CaM after incubation with MsrP 
(Fig. 2b). Altogether, these results indicated that MsrP is able to reduce 
protein-bound Met-O. 

Upon oxidation, two diastereoisomers of Met-O can form, referred 
to as Rand S, owing to the asymmetric position of the oxidized sulfur 
atom in the lateral chain (Fig. 2c). All Msrs described so far exhibit 
stereospecificity, specifically reducing either the R (MsrB, MsrC) 
or the S isoform (MsrA, BisC). Using highly pure diastereoisomers 
(Extended Data Fig. 4), we found MsrP to exhibit activity towards 
both (Fig. 2d), with K,, values of 25.7 +4.7 mM and 8.0 +2.7mM for 
R- and S-Met-O, respectively (Extended Data Fig. 3b). Accordingly, 
the BE100 suppressor strain was able to use R- and S-Met-O (Fig. 2e), 
in contrast to strains expressing single stereospecific Msrs 
(Extended Data Fig. 3c). Thus, MsrP is a new type of Msr with no 
stereospecificity. 

To search for the physiological substrates of MsrP, periplasmic pro- 
teins from a AmsrP mutant were oxidized with HOC, incubated with 
MsrP and subjected to a semi-quantitative two-dimensional LC-MS/MS 
analysis. Twenty proteins that had one or more HOCI-oxidized Met 
residues that MsrP could reduce were identified (Extended Data 
Table 2). Using gel shift assays in combination with LC-MS/MS ana- 
lysis, we confirmed the ability of MsrP to reduce the chaperone SurA 
and the lipoprotein Pal (Fig. 3a, b). Altogether, these results established 
that MsrP is able to repair a wide panel of structurally and functionally 
diverse periplasmic proteins in vitro. 

SurA is the primary periplasmic chaperone, escorting most 8-barrel 
proteins to the outer membrane!*'*, As HOCI-oxidized SurA loses its 
chaperone activity (Fig. 4a), we used this property to probe the physi- 
ological importance of the MsrPQ system. First, we showed that SurA 
could be oxidized in vivo by HOCI and that expression of the MsrPQ 
system, but not of MsrP alone, restored its mobility (Fig. 3c). Similar 
results were obtained for Pal (Fig. 3d), confirming that MsrP and MsrQ 
collaborate in the protection of SurA and Pal from oxidative damage. 
We then tested if the repair of SurA by MsrP, which restores the activity 
of the chaperone in vitro (Fig. 4a), was important to keep SurA active 
under HOCI stress. For this, we used a mutant strain lacking the chap- 
erone Skp, in which SurA becomes essential!*!°, We found that delet- 
ing msrP rendered the Askp strain hypersensitive to HOCI (Fig. 4b), 
suggesting that oxidized, inactive SurA accumulates in the absence 
of MsrP. In agreement with this, the sensitivity of the Askp AmsrP 


© 2015 Macmillan Publishers Limited. All rights reserved 


a b 
1 2 3 1 2 3 
| 
SurA — SurAox = SurA ox Pal Palox Pal ox 
+ MsrP + MsrP 
100 100 
=> = 80 
s & 
oO oO 60 
9 50 1 SurA 9 
kay 1  SurA ox @ 40 
= mm SurAox+MsrP = Cc Pal 
20 = [ Pal ox 
0 co 0 = mm Pal ox + MsrP 
Met 46 Met109 Met 414 Met 62 Met 87 
+ 
Met 114 
c SurA d Pal 


Time (min) (0) 5 15 30 60 90 Time (min) 0 5 15 30 60 90 


eee ee ee 
=a os a _ ono F "Oe" a"a 6 
MISIPMBSIQs (Sb ob OS ob SO Se. imsiP msi = sb Ssh a SS 
0 5 15 30 60 90 0 5 15 30 60 90 


we @ ewe wn eK eK = oa. 8%e ee Seen ee 


msrP - + = + -— + - + = + = + msrP - + - + -+ -+ - + = + 


Figure 3 | The MsrPQ system rescues oxidized Met residues in SurA 
and Pal. a, b, Oxidation of SurA (SurA ox) and Pal (Pal ox) by HO 
leads to a mobility shift resulting from Met-O formation. Incubation 
with MsrP and the inorganic reducing system restores their mobility 
(top). The percentages of Met-O in the various samples were determined 
by LC-MS/MS analysis, confirming that MsrP reduces Met-O in SurA 
and Pal (bottom). Error bars, mean + s.e.m.; n= 3. Met-O residues were 
detected in the untreated and MsrP-treated samples owing to limitations 
inherent to the methodology applied and oxidation of the samples during 
analytical handling. c, d, AmsrPQ cells carrying msrP either alone or 
with msrQ under an isopropyl 3-p-1-thiogalactopyranoside (IPTG)- 
inducible promoter on a plasmid (pAG192 and pAG195, respectively) 
were grown with IPTG (1001M). Cells were treated with chloramphenicol 
(300 1g ml) at an absorbance (Agoo nm) of 0.5 to block new protein 
synthesis and HOCI (3.5 mM) was added. Synthesis of MsrP and MsrQ 
together (top), but not of MsrP alone (bottom), restores SurA and Pal 
mobility. Images in a-d are representative of experiments made in 
biological triplicate. The small shift exhibited by SurA over time in the 
absence of MsrPQ could be due to a residual Msr activity, possibly an 
NADPH-dependent membrane-bound Msr activity previously detected”!. 
Uncropped gels and blots are in Supplementary Fig. 3. 


mutant to HOCI was suppressed by overexpression of SurA (Fig. 4c). 
Further highlighting the need to protect Met residues in periplasmic 
proteins, HOCI-pretreated AmsrP mutants were found to be more sen- 
sitive to SDS, a phenotype indicative of defects in the outer membrane 
(Fig. 4d)!”. 

The conservation of MsrPQ throughout Gram-negative bacteria 
(Extended Data Figs 5 and 6) illustrates the importance of having a 
Met-O reducing system in the periplasm. Neisseria species stand out 
as an exception in lacking MsrPQ. However, in these bacteria, evolu- 
tionary tinkering generated an envelope hybrid protein combining 
two classic stereospecific Msr domains'®. A remarkable feature of 
MsrPQ is that its rescue activity depends on electrons provided by 
the respiratory chain. This represents an entirely novel way to provide 
reducing power for protein quality control in the envelope. Indeed, 
known reducing systems functioning in the periplasm use electrons 
provided by the inner membrane protein DsbD and Trx!’. Hence, 
diverting electrons from the respiratory chain to control extracyto- 
solic protein quality is an unprecedented link between metabolism 
and cellular integrity. 

The chaperone SurA is one of the targets of the MsrPQ system. 
Having a protein folding helper under the control of a repair system 
reveals an additional layer in the complex control network of peri- 
plasmic protein quality. Testing if this system is an attractive target for 
antimicrobial development, as suggested by the colonization defect 
exhibited by the msrP mutant in Campylobacter jejuni”®, will be the 
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Figure 4 | The reducing activity of the MsrPQ system is important for 
envelope integrity. a, Repair of oxidized SurA (SurA ox) by MsrP 

(SurA ox + MsrP) restores the ability of SurA to protect thermally 
unfolded citrate synthase from aggregation. The graph is representative of 
experiments made in biological triplicate (a.u., arbitrary units). b, While 
the wild-type (WT), the Askp and the AmsrP strains are only moderately 
affected by exposure to HOC] (2 mM), the viability of the Askp AmsrP 
mutant (in which SurA is essential) is decreased. Error bars, mean + s.e.m.; 
n=3.c¢, The sensitivity of the Askp AmsrP mutant to HOC] is suppressed 
by SurA overexpression. Error bars, mean = s.e.m.; n= 4. d, Pre-treatment 
with HOCI renders the AmsrP mutant hypersensitive to SDS, indicative of 
envelope defects. Error bars, mean + s.e.m.;n = 3. 


subject of future research. By highlighting the importance of pro- 
tecting proteins targeted to oxidizing compartments, our work calls 
for a detailed investigation of the process of Met-O reduction in the 
endoplasmic reticulum, where only an R-Met-O-specific MsrB has 
been identified*. As has long been speculated, a possibility would be 
that the endoplasmic reticulum contains an epimerase catalysing the 
interconversion of R- and S-Met-O. Alternatively, in light of the pres- 
ent study, the endoplasmic reticulum could contain a novel Met-O 
reducing system yet to be discovered. 


Online Content Methods, along with any additional Extended Data display items and 
Source Data, are available in the online version of the paper; references unique to 
these sections appear only in the online paper. 
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METHODS 


No statistical methods were used to predetermine sample size. The experiments 
were not randomized. The investigators were not blinded to allocation during 
experiments and outcome assessment. 

Strains and microbial techniques. The strains used in this study are listed in 
Supplementary Table 1. Unless otherwise specified, for all deletion mutants, the 
corresponding alleles from the Keio collection” were transferred into the MC4100 
wild-type strain using P1 transduction standard procedures” and checked by 
PCR. To excise the resistance cassette, we used pCP20 (refs 22, 24). Strain AG227, 
deleted for the entire yedYZ operon, was constructed as follows. First, a cat-sacB 
cassette, encoding chloramphenicol acetyl transferase and SacB, a protein con- 
ferring sensitivity to sucrose, was amplified from strain CH1990 using primers 
yed YZ::cat-sacB_Fw and yedYZ::cat-sacB_Rv. The resulting PCR product shared 
a 40-base-pair (bp) homology to the 5’ untranslated region of yedY (msrP) and to 
the 3’ untranslated region of yedZ (msrQ) at its 5’ and 3 ends, respectively. After 
purification, the PCR product was transformed by electroporation into CH1940. 
These cells harbour the pSIM5-tet vector, which encodes the Red recombination 
system proteins Gam, Beta and Exo under the control of the temperature-sensitive 
repressor cI1859, encoded by the same vector. Induction of the Gam, Beta and 
Exo proteins was induced by shifting the cells to 42°C for 15 min before making 
them electrocompetent. Recombinant cells were selected on chloramphenicol- 
containing plates (251gml') at 37°C for 16h. At this temperature, the pSIM5-tet 
vector, which has a temperature-sensitive origin of replication, is lost. Colonies 
were also tested for the presence of the cat-sacB cassette by negative selection 
on sucrose-containing media (5% sucrose, no NaCl). Finally, we verified that the 
cat-sacB cassette replaced the msrPQ operon in the resulting strain (AG219) by 
sequencing across the junctions. The cat-sacB cassette was subsequently moved 
from AG219 to TP1004 by P1 transduction. The cat-sacB cassette was eliminated 
from the resulting strain (AG220) by transforming it with the pSIM5-tet plasmid, 
electroporating it with the oligonucleotide Delta_yedYZ (300 ng) and performing 
lambda red recombination as described above. Recombinants were selected on 
sucrose-containing media at 30°C for 16h. To eliminate the plasmid, the selected 
colonies were grown at 37°C for 16h. Loss of the cassette in the resulting AG227 
strain was verified by positive (sucrose resistance) and negative (chloramphenicol 
sensitivity) selection and by PCR. 

The msrQ deletion mutant (strain BE105) was generated using the PCR knock- 
out method developed in ref. 24. Briefly, a DNA fragment containing the cat gene 
flanked with the homologous sequences found upstream and downstream of the 
yedZ gene was PCR-amplified using pKD3 as template and the oligonucleotides 
P1_Up_YedZ and P2_Down_YedZ. Strain BE100, carrying plasmid pKD46, 
was then transformed by electroporation with the amplified linear fragment. 
Chloramphenicol-resistant clones were selected and verified by PCR. 

The msrP::lacZ fusion was constructed using the method described in ref. 25. 
Briefly, the msrP promoter region lying between nucleotide —797 and nucleo- 
tide +63, using the A nucleotide within the initiation triplet as a reference, was 
amplified by PCR with the appropriate oligonucleotides (lacI-msrPforwara and lacZ- 
msrP reverse). Using mini-lambda-mediated recombineering, the PCR product was 
then directly recombined with the chromosome of a modified E. coli wild-type 
strain (PM1205), carrying a Pgap-cat-sacB cassette inserted in front of lacZ, at the 
ninth codon. Recombinants were selected for loss of the cat-sacB genes, resulting 
in the translational fusion of msrP to lacZ. 

Plasmid construction. The plasmids and primers used in this study are listed in 
Supplementary Tables 2 and 3, respectively. The Yed Y-His¢ (MsrP-Hisg) expres- 
sion vector was constructed as follows. Site-directed mutagenesis using primers 
pTAC_NdelI_Fw and pTAC_Ndel_Rv was performed using pTAC-MAT-Tag-2 as 
template to introduce an Ndel restriction site in the vector, yielding vector pAG177. 
yedY (msrP) DNA was amplified from the chromosome (MC4100) using primers 
pTAC_yedY_Fw and pTAC_yedY-His¢_Rv, which resulted in the fusion of a Hiss 
tag coding sequence at the 3’ end. The PCR product was subsequently cloned into 
pAG177 using Ndel and Bglll restriction sites, generating plasmid pAG178. To 
construct IPTG-inducible pTAC-MAT-Tag-2 vectors expressing either MsrP (with- 
out tag) or both MsrP and MsrQ, we first amplified the corresponding coding DNA 
sequences (msrP or the msrPQ operon) from the chromosome of strain MC4100 
using primer pairs pTAC_yedY_Fw/ pTAC_yedY_Rv and pTAC_yedY_Fw/ 
pTAC_yedZ_Ry, respectively. The PCR products were then cloned into pAG177 
using restriction sites NdeI and BglII, yielding pAG192 (MsrP) and pAG195 
(MsrPQ). The complementation pAM238 vectors constitutively expressing either 
MsrP or MsrQ alone (without tag) or both MsrP and MsrQ were constructed as 
follows. We first amplified the corresponding coding DNA sequences (msrP, msrQ 
or the msrPQ locus) in addition to a 50 bp upstream region from each start codon 
(to include a ribosomal binding site) from the chromosome of strain MG1655 
using primer pairs pAM238_yedY_Fw/ pAM238_yedY_Rv, pAM238_yedZ_Fw/ 
pAM238_yedZ_Rv and pAM238_yedY_Fw/ pAM238_yedZ_Rv, respectively. 
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The PCR products were then cloned into pAM238 using restriction sites KpnI and 
PstI, yielding pAG264 (MsrP), pAG275 (MsrQ), and pAG265 (MsrPQ). 

The vector allowing the arabinose-inducible expression of SurA was constructed 

as follows. The surA-encoding DNA and its 50 bp upstream region (to include a 
ribosomal binding site) were amplified from the chromosome of strain MG1655 
using the primer pair surA_Fw/surA_Rv. The PCR product was then cloned into 
pBAD33 using restriction sites KpnI and Xbal, yielding vector pAG290. 
Analysis of the yedYZ operon expression by RT-qPCR. Expression levels of the 
yed YZ (msrPQ) mRNA were assessed in M63 minimal medium supplemented 
with 0.5% glycerol, 0.15% casamino acids, 1mM MgSOuq, 1mM MoNa2Ou,, 
171M Fe,(SO,)3 and vitamins (thiamine 10,:gml“|, biotin 1j:gml |, riboflavin 
10g ml“! and nicotinamide 101g ml~!). Overnight cultures of MG1655 were 
diluted to Agoonm= 0.04 in fresh M63 minimal medium (100ml) and cultured 
aerobically at 37°C until A¢oonm = 0.8. Cells (10 ml) were then pelleted, resus- 
pended in TriPure (Roche) and homogenized. After mixing with chloroform, 
RNA was isolated by centrifugation (15 min, 15,700g, 4°C), precipitated with 
isopropanol, washed with ethanol 70%, dried and finally resuspended in DEPC 
water. Any residual DNA was eliminated by treatment of the sample with DNase 
(Turbo DNA-free Kit, Ambion). A RevertAid RT kit (Thermo Scientific) was used 
to generate complementary DNA (cDNA) from 11g RNA extracted from each of 
the cultured strains. cDNAs were then diluted 1/10 and submitted to qPCR, using 
a qPCR Core kit for SYBR Green I No ROX (Eurogentec) and a MyiQ Single- 
Colour Real-Time PCR Detection System (Bio-Rad). Expression levels of yedYZ 
were normalized to the expression of gapA. Primers used for qPCR analysis were 
qPCR_yedYZ_Fw and qPCR_yedYZ_Rv for yedYZ, and qgPCR_gapA_Fw and 
qPCR_gapA_Rv for gapA (Supplementary Table 3). 
Immunoblot analysis of MsrP expression. Synthesis of MsrP in strains JB590 and 
BE100 was assessed as follows. Overnight cultures were diluted to A¢oonm = 0.04 
in fresh M63 minimal medium (100 ml) and cultured aerobically at 37°C until 
Aoo0nm = 0.8. Nine hundred microlitres of each culture were then precipitated with 
10% ice-cold trichloroacetic acid (TCA), pellets were washed with ice-cold acetone, 
dried, resuspended and heated at 95°C in Laemmli SDS sample buffer (SB buffer) 
(2% SDS, 10% glycerol, 60 mM Tris-HCl, pH 7.4, 0.01% bromophenol blue), and 
loaded on an SDS-PAGE gel for immunoblot analysis. The protein amounts loaded 
were standardized by taking into account the Agoo nm Values of the cultures. 

To monitor the MsrP expression levels after NaOCl or H.O> treatment, over- 

night cultures of wild-type cells (MG1655) were diluted to Agoonm = 0.04 in 
fresh lysogeny broth (LB) medium (100 ml) and grown aerobically at 37°C to 
Ag6oonm = 9.5. NaOCl (2mM) or H2O2 (1 mM) was then added to the cultures. 
Samples were TCA-precipitated, washed with ice-cold acetone, dried, suspended 
in SB buffer, heated at 95°C and loaded on an SDS-PAGE gel for immunoblot 
analysis. The protein amounts loaded were standardized by taking into account 
the Asoonm values of the cultures. The specificity of the anti-MsrP antibody was 
verified (Supplementary Fig. 5). 
Preparation of pure diastereoisomeric forms of Met-O. L-Methionine sulfoxide 
([a]p74 = + 14.3° (water)), triethylamine (>99%) and methanol (>99.6%) were 
obtained from Sigma-Aldrich, picric acid from Prolabo and DO from SDS. Water 
was purified using Millipore Elix Essential 3 apparatus. 'H and *\C NMR were 
recorded on a Bruker Avance III Nanobay spectrometer (1H: 400 MHz; {'H}C: 
100 MHz). Chemical shifts (6) were referenced to dioxane (*H: 6= 3.75 p.p.m.; 
BC, 6=67.19 p-p.m.)°°, which was added as an internal reference; resonances are 
detailed as follows: 'H, 6 in parts per million (multiplicity, J-coupling in hertz, 
integration, signal attribution); {'H}°C, 6 in parts per million (signal attribution). 
For each diastereoisomer, chemical shifts are similar to those previously reported”’. 
13C resonance assignments were confirmed by heteronuclear single quantum 
coherence experiments. Optical rotations were measured on an Anton Paar 
Modular Circular Polarimeter 200 instrument at 25°C and 589 nm from aqueous 
solution containing 0.8-1.2 g per 100 ml of L-methionine sulfoxide. The values 
reported are the average and s.d. relative to three independent measurements 
recorded on distinct solutions. 

The commercial mixture of diastereoisomers was separated following the 
previously reported method”®. Briefly, 10 ml of water was added to L-methionine 
sulfoxide (1.333 g, 8.069 mmol) and picric acid (1.849 g, 8.071 mmol). The sus- 
pension was heated to reflux until complete dissolution and then slowly cooled to 
room temperature (~25 °C). The suspension was filtered on a sintered funnel and 
the solid was washed with cold water (10 ml in total). Both the solid (dextro) and 
filtrate (Jevo) were collected separately for further purification. 

Dextro. To the dried solid, 20 ml of water were added and the mixture was heated 
to reflux then allowed to cool slowly to room temperature. The solid was filtered 
out, washed with 10 ml water and dried. Again, 11 ml of methanol were added 
to the resulting solid and the mixture heated to reflux. After slow cooling, the 
yellow crystals were filtered, washed with 5 ml methanol and dried. A portion 
was used for structure determination by X-ray analysis. To the dextrogyre picrate 


© 2015 Macmillan Publishers Limited. All rights reserved 


LETTER 


salt (1.345 g, 3.42 mmol), ~1.1 equivalents of triethylamine were added as a dilute 
aqueous solution (22 ml, 175 mM, 3.85 mmol). Subsequently, 200 ml of acetone 
were added portion-wise to the above stirring suspension and a white solid pre- 
cipitated. This was filtered, washed, triturated with acetone and finally dried in 
vacuum (533 mg, 80%). 

Levo. The volume of the filtrate was reduced in vacuum at 40°C to about 3-4 ml 
to obtain a saturated solution and a small amount of precipitate. Then, 1.5 ml of 
water were added, the suspension was filtered and the solid washed with mini- 
mal water (2 ml). The whole step was repeated once (reduce the volume, dilute, 
filter and wash), and the resulting solution was then completely dried in vacuum. 
To the resulting yellow residue, 15 ml of methanol were added and the suspen- 
sion was heated to reflux. In our hands, no solid precipitated upon cooling (in 
contrast with the reported method”); therefore the solution was dried again in 
vacuum. Following the same protocol as before, to the levogyre-enriched picrate 
salt (1.354 g, 3.44mmol), ~1.1 equivalents of triethylamine were added as a con- 
centrated aqueous solution (3.8 ml, 1 M, 3.8 mmol). Afterwards, 200 ml of acetone 
were added portion-wise and a white solid precipitated. This was filtered, washed, 
triturated with acetone and finally dried in vacuum (515 mg, 77%). 

Dextro (L-methionine-S-sulfoxide): [a]p?° = +99.2 + 1.5° (water); 'H NMR 
(400 MHz, DO pD =6.5): 3.88 (t, °J=6.3, 1H, Has), 3.02 (m, 2H, H>s), 2.74 
(s, 3H, Hes), 2.31 (dd, J/= 14.4, 7.6, 2H, Hs); {!H}}3C NMR (100 MHz, D0): 
173.8 (COOs), 54.0 (Cas), 48.9 (C-ys), 37.2 (Ces), 24.4 (CBs). Literature values from 
ref. 28: [a]p7> = +99° (water), from ref. 27: [a]p = +98.2° (water, room temper- 
ature); 'H NMR (300 MHz, D,0): 4.10 (m, 1H), 3.08-2.78 (m, 2H), 2.59 (s, 3H), 
2.32-2.13 (m, 2H); *C NMR (75 MHz, D0): 171.1, 52.0, 48.3, 37.0, 23.5. 

Levo (L-methionine-R-sulfoxide): [a]p?° = —72.7 + 0.5° (water); 'H NMR 
(400 MHz, DO pD =6.5): 3.86 (t,3J=6.3, 1H, Hag), 3.12 (ddd, J= 13.4, 9.6, 7.0, 
1H, Hg: or Hyg), 3.02 (m, 2H, Hs), 2.93 (ddd, J = 13.5, 9.1, 6.8, 1H, Hyp: or 
Hypo), 2.74 (s, 3H, Hep), 2.31 (m, 2H, H8z); {1H} °C NMR (100 MHz, D0): 173.9 
(COOg), 54.2 (Car), 54.0 (Cas), 48.9 (Crp), 37.2 (Ces), 37.0 (Cer), 24.4 (CBR). 
Literature values from ref. 28: [a]p*°= —71.6° (water), from ref. 27: [a]p = —78° 
(water, room temperature); 'H NMR (300 MHz, D30): 4.10 (m, 1H), 3.08-2.78 
(m, 2H), 2.59 (s, 3H), 2.32-2.13 (m, 2H); 3C NMR (75 MHz, D20): 171.1, 52.1, 
48.4, 37.0, 23.7. 

In the 'H NMR spectra, the resonance centred at 3.02 p.p.m. was attributed to 
the S- enantiomer. The relative integral values suggest that R-Met-O is contami- 
nated by 3% of the S- diastereoisomer. Moreover, comparing the measured [a]p”* 
values with those reported in ref. 27, the data are consistent with the presence 
of 3% S- diastereoisomer as a contaminant. Such purity is in line with previous 
reports using the same separation method”®*”’. The absolute configuration of the 
L-methionine-S-sulfoxide was confirmed by X-ray structural analysis and matches 
previous assignments”””. 

Synthesis of N-acetyl-Met-O. To synthesize N-acetyl-Met-O, Met-O (30 mg; 
Sigma-Aldrich) was solubilized in 2 ml 100% acetic acid. After addition of 2ml 
of 97% acetic anhydride, the resulting mixture was incubated 2h at 23°C. Then, 
2 ml of water were added and the mixture was lyophilized overnight. Finally, 
the lyophilized N-acetyl-Met-O was washed three times with 6 ml of water, 
re-lyophilized and suspended in 500 mM Na2HPO,, pH 9.0 to a final concentration 
of 1.5 M. The pH was then adjusted to 7 with NaOH. 

Kinetic analysis of MsrP activity. The MsrP reductase activity was followed 
spectrophotometrically at 600 nm by monitoring the substrate-dependent 
oxidation of reduced benzyl viologen, serving as an electron donor. Reactions 
were performed anaerobically at 30°C in degassed and nitrogen-flushed 50 mM 
MOPS, pH 7.0 using stoppered cuvettes. Benzyl viologen was used at a final 
concentration of 0.4mM (molar extinction coefficient, ¢, of reduced benzyl 
viologen = 7,800 M-!cm!) and reduced with sodium dithionite. The final reaction 
volume was kept constant, with the ordered addition of benzyl viologen, sodium 
dithionite, 1-32 mM N-acetyl-methionine sulfoxide (NacMet-O) and 10nM 
MsrP-Hisg. The concentrations used for the R- and S-Met-O diastereoisomers 
were 1-64mM. The Michaelis-Menten parameters (maximum velocity (Vinax) 
and K,,) were determined using Graphpad Prism software. 

Analysis of MsrA and MsrB activities. The reductase activities of MsrA and MsrB 
were followed spectrophotometrically at 340 nm by monitoring the substrate- 
dependent oxidation of NADPH (¢ = 6,220 M"!cm”'). Reactions were performed 
at 37°C in HEPES-KOH 20 mM, pH 7.4, NaCl 10 mM, and the final reaction 
volumes were kept constant, with the ordered addition of 250}1M NADPH (Roche), 
2.611M TrxR, 40\.M Trx, 64mM substrate and 1.54.M of either MsrA or MsrB. 
Identification of the periplasmic proteins repaired by MsrP using two- 
dimensional LC-MS/MS. The identification of the MsrP substrates was performed 
as follows. AG89 cells (2L) were grown aerobically at 37°C in terrific broth to 
Agoonm= 0.8. Periplasmic extracts were prepared as described previously*!. Briefly, 
cells were pelleted by centrifugation at 3,000g for 20 min at 4°C and incubated on 


ice with gentle shaking for 30 min in 100 mM Tris-HCl, pH 8.0, 20% sucrose, 1 mM 
EDTA. This mixture also contained 20 mM N-ethylmaleimide to alkylate reduced 
cysteine residues in proteins to prevent their subsequent oxidation. Periplasmic 
proteins were then isolated by centrifugation of the cells at 3,000g for 20 min at 
4°C. The periplasmic extract was subsequently concentrated by ultrafiltration 
in an Amicon cell (3,000 Da cutoff, YM-3 membrane) and loaded on a PD-10 
column (GE Healthcare) equilibrated with 50 mM NaPi, pH 8.0, 50 mM NaCl. 
After concentration using a 5kDa cutoff Vivaspin 4 (Sartorius) concentrator, 
the extract was finally separated in three samples. Two samples were incubated 
10min at 37°C with 2mM NaOCl whereas the third was left untreated to serve as 
reduced control. NaOCl was then removed by gel filtration using a NAP-5 column 
(GE Healthcare) equilibrated with 50 mM MOPS, pH 7.0. The untreated sample 
was also subjected to the NAP-5 gel filtration. 

One of the NaOCl-oxidized fractions was then reduced in vitro by incubation 
for 1h at 37°C with 10\1M MsrP, 10mM benzyl viologen and an excess of sodium 
dithionite. The other NaOCl-oxidized fraction, used as an oxidized control, and the 
non-oxidized fraction were incubated with 10 mM benzyl viologen and an excess 
of sodium dithionite but without MsrP. The three samples were then de-salted 
by dialysis against 50 mM MOPS, pH 7.0 by using Slide-A-Lyzer 3,500 MWCO 
G2 cassettes (Thermo Scientific). The three samples (500 1g) were precipitated 
by adding TCA to a final concentration of 10% w/v. The resulting pellets were 
washed with ice-cold acetone, dried in a Speedvac, suspended in 0.1 M NH4HCOs, 
pH 8.0, digested overnight at 30°C with 3,1g sequencing-grade trypsin, and ana- 
lysed by two-dimensional LC-MS/MS essentially as described*. Briefly, peptides 
were first separated on a first-dimension hydrophilic interaction liquid chromatog- 
raphy (HILIC) column with a reverse acetonitrile gradient and 25 fractions of 1 ml 
collected (2 min per fraction). After drying, peptides were analysed by LC-MS/MS 
ona C18 column. The MS scan routine was set to analyse by MS/MS the five 
most intense ions of each full MS scan; dynamic exclusion was enabled to assure 
detection of co-eluting peptides. 

Protein identification by mass spectrometry. Raw data collection of approx- 
imately 230,000 MS/MS spectra per two-dimensional LC-MS/MS experiment 
was followed by protein identification using SEQUEST. All MS raw files have 
been deposited in the ProteomeXchange Consortium via the PRIDE partner 
repository with the data set identifier PXD002804. In detail, peak lists were 
generated using extract-msn (ThermoScientific) within Proteome Discoverer 
1.4.1. From raw files, MS/MS spectra were exported with the following settings: 
peptide mass range 350-5,000 Da; minimal total ion intensity 500. The resulting 
peak lists were searched using SequestHT against a target-decoy E. coli pro- 
tein database (release 07.01.2008, 8,678 entries comprising forward and reverse 
sequences) obtained from Uniprot. The following parameters were used: trypsin 
was selected with proteolytic cleavage only after arginine and lysine, number of 
internal cleavage sites was set to 1, mass tolerance for precursors and fragment 
ions was 1.0 Da, and considered dynamic modifications were +15.99 Da for oxi- 
dized methionine and +125.12 Da for N-ethylmaleimide on cysteines. Peptide 
matches were filtered using the q value and posterior error probability calculated 
by the Percolator algorithm ensuring an estimated false positive rate below 5%. 
The filtered SEQUEST HT output files for each peptide were grouped according 
to the protein from which they were derived using the multiconsensus results 
tool within Proteome Discoverer. Then the values of the spectral matches of 
only Met-containing peptides were combined from the three two-dimensional 
LC-MS/MS experiments and exported in a Microsoft Excel spreadsheet, with 
the rows referring to the peptide sequences and the columns to the fractions 
of the HILIC column. Oxidation of Met residues to Met-O by NaOCl causes a 
hydrophilic shift, which influences their retention time and makes them elute 
later (4-8 min) than their reduced counterpart on a HILIC column. If these 
Met-O are reduced by MsrP, they will then show a hydrophobic shift and elute 
at the same retention time on the HILIC column as in the control sample. By 
comparing the retention times and the number of peptide spectral matches of the 
Met-O-containing peptides in a periplasmic extract under three experimental 
conditions (control, oxidized by NaOCl with and without MsrP), one can identify 
‘bona fide’ potential MsrP substrates. 

Protein expression and purification. TP1004 cells harbouring plasmid pAG178 
and overexpressing MsrP-Hisg protein were grown aerobically at 30 °C in ter- 
rific broth (Sigma-Aldrich) supplemented with sodium molybdate (1.5 mM) and 
ampicillin (200 1g ml). When cells reached Agoonm= 0.8, expression was induced 
with 0.1mM IPTG for 3h. Periplasmic proteins were then extracted as in ref. 32. 
MsrP-Hisg was then purified by loading the periplasmic extract on a 1 ml HisTrap 
FF column (GE Healthcare) equilibrated with buffer A (NaPi 50mM, pH 8.0, NaCl 
300 mM). After washing the column with buffer A, MsrP-Hisg was eluted by apply- 
ing a linear gradient of imidazole (from 0 to 300 mM) in buffer A. The fractions 
containing MsrP-Hisg were pooled, concentrated using a 5 kDa cutoff Vivaspin 
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15 (Sartorius) device and de-salted on a PD-10 column (GE Healthcare) equili- 
brated with 50 mM NaPi, pH 8.0, 150 mM NaCl. 

VUI1 CaM, MsrA and MsrB were expressed and purified as described 
previously***>. 

Trx was expressed and purified as follows. BL21 (DE3) cells harbouring plasmid 
pMD205, overexpressing Trx with a carboxy (C)-terminal His, tag, were grown 
aerobically at 37°C in LB supplemented with kanamycin (501g ml !). Expression 
was induced at A¢o0 nm = 0.6 with 1mM IPTG for 3h. Cells were then pelleted, 
resuspended in buffer A (NaPi 50 mM, pH 8.0, NaCl 300 mM) and disrupted by 
two passes through a French pressure cell at 12,000 psi. The lysate was then centri- 
fuged at 30,000g and at 4°C for 45 min, to remove cell debris, and Trx was purified 
as described for MsrP-Hisg. Ni- NTA-purified Trx was then loaded on a 120ml 
HiLoad 16/60 Superdex 75 PG column (GE Healthcare) previously equilibrated 
with HEPES-KOH 50 mM, pH 7.4, NaCl 100 mM. The resulting Trx-containing 
fractions were pooled and concentrated using a 5 kDa cutoff Vivaspin 15 device. 

Thioredoxin reductase (TrxR) was expressed and purified as follows. BL21 
(DE3) cells harbouring plasmid pPL223-2, overexpressing TrxR with an amino 
(N)-terminal His, tag, were grown aerobically at 37°C in LB supplemented with 
ampicillin (200,.g ml‘). Expression was induced at Agoo nm = 0.6 with 1 mM IPTG 
for 3h. Protein extraction was performed as described for Trx and purification was 
performed as described for MsrP-Hisg. 

BL21 (DE3) cells harbouring plasmid pKD11, overexpressing SurA with a 
C-terminal His¢ tag, were grown aerobically at 37°C in LB supplemented with kana- 
mycin (50,1g ml!) Expression was induced at Agoq nm = 0.6 with 1 mM IPTG for 
3h. Protein extraction and purification were performed as described for MsrP-Hisg. 

MG1655 cells harbouring plasmid pKD84, overexpressing SurA with a 
C-terminal Strep-tag, were grown aerobically at 37°C in LB supplemented with 
ampicillin (200j1gml'). Expression was induced at Agoo nm = 0.7 with a final con- 
centration of 200,.g1-! anhydrotetracycline (AHT) for 5h. Protein extraction was 
performed as described for MsrP-His¢. SurA-Strep was then purified by loading 
the periplasmic extract on a 5 ml Strep-Tactin Superflow cartridge H-PR (IBA) 
equilibrated with buffer A (Tris-HCl 100 mM, pH 8.0, NaCl 150mM, EDTA 1 mM). 
After washing the column with buffer A, SurA-Strep was eluted by applying a linear 
gradient of desthiobiotin (from 0 to 2.5 mM) in buffer A. The fractions containing 
SurA-Strep were pooled, concentrated using a 5 kDa cutoff Vivaspin 15 (Sartorius) 
device and de-salted on a PD-10 column (GE Healthcare) equilibrated with 50 mM 
NaPi, pH 8.0, 150mM NaCl. 

A modified version of Pal lacking the signal sequence and in which the first 

cysteine of the lipobox was replaced by an alanine (Palci,) was expressed with 
an N-terminal His, tag from the pEB0513 vector in BL21 (DE3) cells. Cells were 
grown aerobically at 37°C in LB supplemented with ampicillin (200,1.g ml1). 
Expression was induced at A¢o0 nm = 0.6 with 1 mM IPTG for 3h. Protein extraction 
was performed as described for Trx and purification was performed as described 
for MsrP-Hisg. 
In vitro repair of oxidized CaM, SurA and Pal by MsrP. CaM was oxidized 
in vitro as described previously*®. SurA-Hisg and Pal were oxidized in vitro by 
incubating the purified proteins (501M) for 2h 30 min at 30°C with 100 mM H,0, 
in a buffer containing 50 mM NaPi, pH 8.0, 50 mM NaCl. HO; was then removed 
by gel filtration using a NAP-5 column (GE Healthcare) equilibrated with 50 mM 
NaPi, pH 8.0, 150mM NaCl. 

In vitro repair of oxidized CaM (CaMox), SurA (SurA ox) and Pal (Pal ox) was 
assessed by incubating the oxidized proteins (2\1M of CaMox and SurA ox, 51M 
of Pal ox) with purified MsrP-His¢ (211M for CaMox and SurA ox, 541M for Pal 
ox), 10mM benzyl viologen and an excess of sodium dithionite at 37°C for 1h. As 
controls, the oxidized proteins were incubated separately with either MsrP-His¢ 
or the inorganic reducing system (benzyl viologen and sodium dithionite). The 
reactions were stopped by adding SB buffer and heating at 95°C for the CaM 
and SurA samples or by adding 0.1% trifluoroacetic acid for the Pal samples. The 
CaM and SurA samples were then loaded on an SDS-PAGE gel and the proteins 
visualized with the PageBlue Protein Staining Solution (Fermentas). For the Pal 
samples (20|1g), proteins were separated by reverse-phase high-performance liquid 
chromatography on a C4 column (Vydac 214TP54, 4.6mm x 250mm) at a flow 
rate of 400 11min“! with a linear gradient of acetonitrile in 0.1% trifluoroacetic 
acid (0-70% acetonitrile in 90 min). Absorbance was monitored at 214nm and 
the peaks were collected. The fractions were dried in a Speedvac and the pro- 
teins resupsended in 251] of 100 mM NH4HCO; before overnight digestion at 
30°C with 0.5 1g of trypsin or EndoGlu-C. The peptides were then analysed as 
described below. 

For CaM and SurA, the gel bands corresponding to the different oxidation states 
were in-gel digested with trypsin and the resulting peptides analysed by LC-MS/MS 
ona C18 reverse-phase column as described above. Relative abundances of every 
Met-containing peptide in its different oxidation state were obtained by integration 
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of peak area intensities, taking into account the extracted ion chromatogram of 
both doubly and triply charged ions. 

In vivo repair of oxidized SurA and Pal by MsrP. The in vivo repair of SurA ox 
and Pal ox by the MsrPQ system or MsrP alone expressed from plasmids pAG195 
and pAG192, respectively, was performed as follows. Overnight cultures of AG233 
(containing the empty pAG177 vector), AG234 (containing the pAG195 plasmid) 
and AG289 (containing the pAG192 plasmid) were diluted to A¢oo nm = 0.04 into 
fresh LB medium (100 ml) and cells were grown aerobically at 37 °C in the presence 
of 0.1mM IPTG and 200g m1“! ampicillin. At Ago nm =0.5, cells were subjected 
to NaOCl] treatment (3.5 mM) and protein synthesis was blocked by the addition 
of chloramphenicol (300 1g ml~!). Samples were taken at different time points 
after NaOCl addition and precipitated with TCA. The pellets were then washed 
with ice-cold acetone, suspended in SB buffer, heated at 95°C and loaded ona 
SDS-PAGE gel for immunoblot analysis using anti-Pal*” and anti-SurA antibodies. 
The specificity of the anti-SurA antibody was verified (Supplementary Fig. 6). 
The protein amounts loaded were standardized by taking into account the Agoo nm 
values of the cultures. 

Oxidation, repair and purification of SurA for analysis of chaperone function. 
SurA-Strep was oxidized in vitro by incubating the purified protein (200 1M) 
for 3h at 30°C with 100 mM H,O)} in a buffer containing 50 mM NaPi, pH 8.0, 
150mM NaCl. HO; was then removed by gel filtration using a NAP-5 column 
(GE Healthcare) equilibrated with 50 mM NaPi, pH 8.0, 150mM NaCl. For the 
in vitro repair of oxidized SurA (SurA ox), the oxidized protein (301M) was incu- 
bated with purified MsrP-Hisg (30|1.M), 10 mM benzyl viologen and 10 mM of 
sodium dithionite at 37°C for 1h. Following repair, SurA was purified by passing 
the sample through a gravity flow column containing 2001] Strep-Tactin Sepharose 
beads (from a 50% suspension, IBA), previously equilibrated with buffer A 
(Tris-HCl 100 mM, pH 8.0, NaCl 150 mM, EDTA 1 mM). After washing with buffer A, 
repaired SurA was eluted using buffer A containing 2.5mM desthiobiotin. The 
elution fractions were pooled and submitted to buffer exchange using a NAP-5 
column (GE Healthcare) equilibrated with 50 mM NaPi, pH 8.0, 150 mM NaCl. 
To check for the correct oxidation, repair and purification of SurA, samples were 
loaded on an SDS-PAGE gel and the proteins visualized with the PageBlue Protein 
Staining Solution (Fermentas). 

Analysis of chaperone function. The ability of SurA to act as a chaperone pre- 
venting the thermal aggregation of citrate synthase (Sigma, reference C3260) was 
assessed as follows. The aggregation of citrate synthase (0.15 1M) was monitored at 
43°C in 40 mM HEPES-KOH, pH 7.5, in the absence or in the presence of 0.6 1M 
SurA, SurA ox or MsrP-repaired SurA ox using light-scattering measurements. To 
avoid effects that might have been caused by the protein buffer, all samples were 
added to the assay in constant volume. SurA ox and MsrP-repaired SurA ox were 
obtained as described above. Light-scattering measurements were made using a 
Varian Cary Eclipse spectrofluorometer both with excitation and with emission 
wavelengths set to 500 nm at a spectral bandwidth of 2.5 nm. Data points were 
recorded every 0.1s. 

Genetic analysis of Met-O assimilation. The ability of various E. coli strains 
(BE100, JB08, CH193, BE104) to assimilate Met-O was assessed on M9 minimal 
medium supplemented with either Met or Met-O at 20,1g ml“. Plates were incu- 
bated at 37°C for 72h. Overnight cultures of strains AG272, AG273, AG279 and 
AG274 were diluted to A¢oo nm = 0.04 into fresh M63 minimal medium (100 ml) 
supplemented with 0.5% glycerol, 150 1g ml! of each amino acid, 1 mM MgSO,, 
1mM MoNa;O,, 171M Fe2($O4)s, vitamins (thiamine 10j1g ml“, biotin 1jugml, 
riboflavin 10j1g ml“, and nicotinamide 10j:g ml!) and 100j.g mI! spectinomy- 
cin, and grown aerobically at 37°C. When Agoo nm reached 0.5, cells (5 ml) were 
washed three times with M63 medium containing 150,.g ml“! Met-O instead of 
methionine, and serially diluted in the same medium. Five microlitres of each 
dilution were then spotted on M63 plates containing either Met or Met-O at 
150,.gml-|, and plates were subsequently incubated at 37°C for 40h. 

HOCL induction assays. The msrP::lacZ-containing strains (CH183, CH186 and 
CH187) were grown at 37°C with shaking in M9 minimal medium. When cells 
reached Agoonm* 0.2, cultures were split into two plastic tubes, one of them con- 
taining HOCI (200,1M). These tubes were then incubated with an inclination of 90° 
with shaking at 37°C. After 30 min of incubation, 1 ml was harvested and the bac- 
teria were resuspended in 1 ml of 3-galactosidase buffer. Levels of }-galactosidase 
were measured as described**. 

HOCI survival assays. NR744, NR745, CH0127 and AG190 cells were grown 
aerobically at 37°C with shaking in 50 ml of LB medium in 500 ml flasks. When 
cells reached A¢oonm* 0.45, 5ml samples were transferred to conical polypropyl- 
ene centrifuge tubes (50 ml; Sarstedt) and HOC] (2 mM) was added. Cells were 
then incubated at 37°C with shaking (1501r.p.m.) at 90° inclination. Samples were 
taken at various time points after stress, diluted in PBS buffer, spotted on LB 
agar and incubated at 37 °C for 16h. Cell survival was determined by counting 
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colony-forming units (c.f.u.) per millilitre. The absolute c.f.u. at time-point 0 (used 
as 100%) was ~108 cells per millilitre in all experiments. For strains CH194, CH196 
and CH197, the same protocol was used with chloramphenicol (25,1gml~') and 
arabinose (0.2%) added to the cultures. 

SDS survival assays. Cells (MG1655 and BE107) were grown at 37°C with shaking 
in 10 ml of LB (in 100 ml flasks). When cells reached Agoonm*¥ 0.8, 5 ml samples 
were transferred to conical polypropylene centrifuge tubes (50 ml, Sarstedt) and 
HOCI (2 mM) was added. After 5 min of incubation, samples were taken and 
diluted in PBS buffer to ~2 x 10° cells per millilitre. Aliquots (100 11) were then 
spread on LB agar plates containing SDS (1%). Colonies were counted the next day. 
Data set construction and phylogenetic analyses. A non-redundant local pro- 
tein database containing 1,342 complete prokaryotic proteomes available in NCBI 
(http://www.ncbi.nlm.nih.gov/) as of 30 July 2014 was built. This database was que- 
ried with the BlastP program (default parameters)”, using YedY (NP_416480) and 
YedZ (NP_416481) of E. coli strain K-12 substrate MG1655 as a seed. Distinction 
between homologous and non-homologous sequences was assessed by visual 
inspection of each BlastP output (no arbitrary cut-off on the E value or score). To 
ensure that we did not overlook divergent YedY or YedZ proteins, iterative BlastP 
queries were performed using homologues identified at each step as new seeds. 
The list of YedY and YedZ homologues is provided in Supplementary Data 1. The 
retrieved sequences were aligned using MAFFT version 7 (default parameters*®; 
Supplementary Data 2 and 3). Each alignment was visually inspected and manually 
refined when necessary using the ED program from the MUST package*’. Regions 
where the homology between amino-acid positions was doubtful were removed 
by using BMGE software (BLOSUM30 similarity matrix”). 

For each homologue, the genomic context was investigated using MGcV 
(Microbial Genomic context Viewer*’). The domain composition and protein 
location of each homologue was also analysed using pfam version 27.0 (ref. 44), 
SignalP version 4.1 (ref. 45) and TMHMM server version 2.0 (ref. 46), respectively. 

For the YedY protein, preliminary phylogenetic analysis used FastTree version 2 
and a gamma distribution with four categories”. On the basis of the resulting tree, 
the subfamily containing the sequence from E. coli was identified and selected for 
further phylogenetic investigations. The corresponding sequences were realigned 
using MAFFT version 7. The resulting alignment was trimmed with BMGE as 
previously described. 

Maximum likelihood trees were computed using PHYML version 3.1 (ref. 48) 
with the Le and Gascuel model (amino-acid frequencies estimated from the data 
set) and a gamma distribution (four discrete categories of sites and an estimated 
alpha parameter) to take into account variations in evolutionary rate across sites. 
Branch robustness was estimated by the non-parametric bootstrap procedure 
implemented in PhyML (100 replicates of the original data set with the same 
parameters). Bayesian inferences were performed using MrBayes 3.2 (ref. 49) with 
a mixed model of amino-acid substitution including a gamma distribution (four 
discrete categories). MrBayes was run with four chains for one million generations 
and trees were sampled every 100 generations. To construct the consensus tree, the 
first 2,000 trees were discarded as ‘burn in. 
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Extended Data Figure 1 | Induction of MsrPQ by HOCI is dependent 
on the presence of a functional Yed VW two-component system. Top, 
immunoblot analysis shows that the induction of MsrP synthesis by HOCI 
(0.2 mM) is yedW-dependent. The image is representative of experiments 
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made in biological triplicate. Bottom, an msrP::lacZ fusion was used as a 
read-out for msrP expression. Deletion of yedV upregulates msrP expression, 
while deletion of yedW prevents its induction by HOCI. Error bars, 

mean + s.e.m.; = 4. The uncropped blot is shown in Supplementary Fig. 4. 
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Extended Data Figure 2 | Respiratory chain-powered, non- 
stereospecific reduction of Met-O in periplasmic proteins by the 
MsrPQ system maintains envelope integrity. Upon exposure to reactive 
species of chlorine (RCS) and/or reactive species of oxygen (ROS), 
methionine residues (Met) in periplasmic proteins such as SurA and Pal 
get oxidized and randomly form either the R- or the S- diastereoisomer 
of Met-O. This results in the loss of function of some proteins important 


_~* 


Protein 


for maintaining the integrity of the envelope, such as SurA, giving rise to 
envelope defects. MsrP catalyses the reduction of both diastereoisomers 
of Met-O with the help of its molybdenum-molybdopterin (Mo-MPT) 
cofactor. Electrons for reduction are provided by the quinone (Q) pool of 
the respiratory chain through MsrQ, the inner membrane haem 
b-containing partner of MsrP. PG, peptidoglycan. 
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Extended Data Figure 3 | MsrP non-stereospecifically reduces Met-O. (Met MsrA” MsrB’ MsrC’, producing BisC) in only able to grow on 
a, MsrP reduces N-acetyl-Met-O (NacMet-O), a substrate mimicking S-Met-O. Deletion of msrP in strain BE100 (Met- Msr~ SupMe-0 ) 
protein-bound Met-O, with K,, = 3.8 + 1.2 mM, turnover number prevents its growth on R- and S-Met-O (strain BE104= Met" Msr- 
(Keat) = 30.5 £3.18") and Vinax = 56.3 + 5.8:mol min“! per milligram SupMet0 AmsrP, compare with growth of BE100 in Fig. 2e). Images 
protein (error bars, mean + s.d.; n = 3). b, MsrP is a non-stereospecific are representative of experiments made in biological triplicate. d, The 
Ms, being able to reduce both S-Met-O (with K,, = 8.0 + 2.7mM, periplasmic chaperone SurA was treated with H2Os, giving rise to SurA 
Kat = 36.0 + 3.6 8! and Vinax = 67.2 + 6.41mol min“! per milligram ox, a sample of which was subsequently incubated with MsrP and the 
protein) and R-Met-O (with Ky, =25.7 +4.7 mM, kear = 168.3 + 15.0871 inorganic reducing system in vitro. The oxidation state of specific Met 
and Vinax = 313.4 + 27.6,1mol min! per milligram protein). Error bars, residues (Met 136, 231 and 298) in the various samples was determined by 
mean +s.d.; n= 3. c, Strain JB08 (Met’ MsrA~ MsrB BisC’, producing LC-MS/MS analysis. Error bars, mean +s.e.m.; n= 4. 


MsrC) is able to grow only on R-Met-O, whereas strain CH193 
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Extended Data Figure 4| Preparation of pure diastereoisomeric forms of L-methionine sulfoxide in D.O pD 6.5, either as a mixture of R- and 
of Met-O. a, The Oak Ridge Thermal Ellipsoid Plot (ORTEP ellipsoid) S- diastereoisomers (top), isolated S- (middle) or isolated R- (bottom) 
representation with 50% probability level of the crystal structure for the (containing 30 mM dioxane as an internal reference). d, Zoom on the 
isolated salt of t-methionine-S-sulfoxide (right) picrate (left). The grey, '3C NMR spectra of ~150 mM solutions of L-methionine sulfoxide in 
blue, red, white and yellow spheres respectively represent carbon, nitrogen, D.O pD 6.5, either as a mixture of R- and S- diastereoisomers (top), 
oxygen, hydrogen and sulfur atoms. b, Chemdraw representation of isolated S- (middle) or isolated R- (bottom) (containing 30 mM dioxane 


L-methionine-R,S-sulfoxide with proton and carbon positioning (relative to _as an internal reference). 
NMRassignment). c, Zoom on the 'H NMR spectra of ~150 mM solutions 
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Extended Data Figure 5 | Individual phylogenies of YedY. Shown are Only posterior probabilities and bootstrap values above 0.5 and 50%, 
unrooted Bayesian phylogenetic trees for YedY (b1971, 310 sequences, respectively, are shown. Scale bars, average number of substitutions per 
260 positions). Numbers at nodes indicate posterior probabilities site. In the phylogenetic tree, YedY from E. coli is highlighted in grey. 


computed by MrBayes“’ and bootstrap values computed by PhyML*. 
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Extended Data Figure 6 | Individual phylogenies of YedZ. Shown are Only posterior probabilities and bootstrap values above 0.5 and 50%, 
unrooted Bayesian phylogenetic trees for YedZ (b1972, 369 sequences, respectively, are shown. Scale bars, average number of substitutions per 
135 positions). Numbers at nodes indicate posterior probabilities site. In the phylogenetic tree, YedZ from E. coli is highlighted in grey. 


computed by MrBayes*’ and bootstrap values computed by PhyML*. 
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Extended Data Table 1 | The MsrPQ system uses electrons from the respiratory chain to reduce free Met-O 


Strain description Met Met-O 
Met # £ 
Met Msr (JB590) * 2 
Met Msr- Sup™et-0* (BE100) + + 
Met Msr Sup™et-0* AyedZ (BE105) + - 
Met Msr empty vector (AG272) rs S 
Met Msr pyedY (AG273) ¥ a 
Met Msr pyedZ (AG279) 4 7 
Met Msr pyedYyedZ (AG274) + + 
Met Msr Sup™e-©* AmenA AubiE (BE106) re : 


This table shows the ability of the various strains to grow (+) or not (—) using Met-O as the sole Met source. Strains were grown for 40-72 h at 37 °C. The results are representative of experiments made 
in biological triplicate. 
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Extended Data Table 2 | List of proteins identified as potential MsrP substrates 


Number and percentage of 


Protein pungnen methionines in the protein’ 
SurA Primary periplasmic chaperone 14 (3.4%) 
LolA Outer-membrane lipoprotein carrier protein 2 (1.1%) 

Pal Peptidoglycan-associated lipoprotein 6 (3.9%) 
MlaC Probable phospholipid-binding protein 4 (2.1%) 
PpiA Peptidyl-prolyl cis-trans isomerase A 4 (2.4%) 
DsbA Thiol:disulfide interchange protein 6 (3.2%) 
CysP Thiosulfate-binding protein 4 (1.3%) 
PotD Spermidine/putrescine-binding periplasmic protein 9 (2.8%) 

MppA Periplasmic murein peptide-binding protein 7 (1.4%) 
Prox Glycine betaine-binding periplasmic protein 6 (1.9%) 
MalE Maltose-binding periplasmic protein 6 (1.6%) 
MgIB D-galactose-binding periplasmic protein 6 (1.9%) 
RbsB D-ribose-binding periplasmic protein 4 (1.5%) 
FecB Fe** dicitrate-binding periplasmic protein 7 (2.5%) 
RenB Nickel/cobalt homeostasis protein 2 (2.3%) 
ZnuA High-affinity zinc uptake system protein 6 (2.1%) 
Ecotin General inhibitor of pancreatic serine proteases 4 (2.8%) 

Ivy Inhibitor of vertebrate lysozyme 5 (3.9%) 
PspE Thiosulfate sulfurtransferase 2 (2.4%) 
YmgD Uncharacterized protein 4 (4.4%) 


*Referring to the mature protein without its signal sequence 


Semi-quantitative two-dimensional LC-MS/MS analysis was used to identify proteins that have one or more oxidized Met residues that MsrP could reduce. The first column indicates the name 
of the protein, the second describes its function and the third gives the number and percentage of methionine residues in the mature protein (excluding the signal sequence). 
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Neutrophils support lung colonization of 
metastasis- initiating breast cancer cells 


Stefanie K. Weulek! & Iaria Malanchi! 


Despite progress in the development of drugs that efficiently 
target cancer cells, treatments for metastatic tumours are often 
ineffective. The now well-established dependency of cancer cells on 
their microenvironment! suggests that targeting the non-cancer-cell 
component of the tumour might form a basis for the development 
of novel therapeutic approaches. However, the as-yet poorly 
characterized contribution of host responses during tumour growth 
and metastatic progression represents a limitation to exploiting this 
approach. Here we identify neutrophils as the main component and 
driver of metastatic establishment within the (pre-) metastatic lung 
microenvironment in mouse breast cancer models. Neutrophils 
have a fundamental role in inflammatory responses and their 
contribution to tumorigenesis is still controversial?~*. Using various 
strategies to block neutrophil recruitment to the pre-metastatic site, 
we demonstrate that neutrophils specifically support metastatic 
initiation. Importantly, we find that neutrophil-derived leukotrienes 
aid the colonization of distant tissues by selectively expanding the 
sub-pool of cancer cells that retain high tumorigenic potential. 
Genetic or pharmacological inhibition of the leukotriene-generating 
enzyme arachidonate 5-lipoxygenase (Alox5) abrogates neutrophil 
pro-metastatic activity and consequently reduces metastasis. Our 
results reveal the efficacy of using targeted therapy against a specific 
tumour microenvironment component and indicate that neutrophil 
Alox5 inhibition may limit metastatic progression. 

In the presence of a growing tumour, subclinical changes in leukocyte 
composition at distant sites have been reported to favour metastatic 
growth’. Cancer cells within a tumour are heterogeneous and retain 
different tumorigenic potentials. Nonetheless, metastasis-initiating cells 
(MICs) depend on a favourable microenvironment to grow efficiently 
at the distant site®!°. We therefore reasoned that an altered presence 
of leukocytes within distant tissues of tumour-bearing hosts might 
influence specific subsets of disseminating cancer cells. We investi- 
gated this hypothesis using the lung metastatic MMTV-polyoma mid- 
dle T antigen (PyMT) mammary tumour mouse model, which allows 
monitoring of the cell subpopulation functionally defined by a higher 
metastasis initiation ability (CD24*CD90* MICs)*. 

Inaccordance with previous reports!!, we found CD11btLy6G* neu- 
trophils to be systemically mobilized in MMT V-PyMT* tumour-bearing 
mice and, despite their low frequency within the primary tumour 
microenvironment, they were the main immune component that 
increased in metastatic lungs (Fig. la and Extended Data Fig. la-l). 
Importantly, CD11b*Ly6G* cells accumulated in the lung before can- 
cer cells infiltrated the tissue (pre-metastatic lung) and their numbers 
increased during metastatic progression (metastatic lung) (Fig. la, b). 
We addressed the functional relevance of high CD11btLy6G* 
neutrophil numbers by analysing the metastatic progression of 
MMTV-PyMT* tumour-bearing mice in a neutropenic granulocyte 
colony-stimulating factor (Gcsf)-null background. Mice deficient in 
G-CSF expression developing mammary tumours failed to accumulate 
neutrophils in the lungs (Fig. 1d and Extended Data Fig. 2a). Notably, 
genetic neutropenia resulted in a robust reduction of spontaneous lung 


metastasis, despite not affecting primary tumour growth (Fig. le, g 
and Extended Data Fig. 2b). No differences in lung macrophages 
compared with wild-type mice were detected (Extended Data Fig. 2c). 
Lack of G-CSF expression by cancer cells altered neither lung neu- 
trophil accumulation nor metastasis (Extended Data Fig. 2d). In 
an alternative genetic strategy for neutrophil depletion, we crossed 
MMTV-PyMT* mice with neutrophil elastase (Ela2)-Cre and with 
ROSA-Flox-STOP-Flox diphtheria toxin (DTA) mice. Here, neutro- 
phil-specific Cre expression led to DTA-mediated reduction of lung 
neutrophils in tumour-bearing mice, without altering lung mac- 
rophages and circulating myeloid cells or activating bone marrow nat- 
ural killer (NK) and cytotoxic T cells (Extended Data Fig. 2e, f, h-j). 
Importantly, metastatic progression was impaired in MMTV-PyMTT- 
Ela2-Cre-DTA* mice without affecting primary tumour growth 
(Fig. 1f and Extended Data Fig. 2f, g). 

Since lung neutrophil increase precedes cancer cell infiltration 
(Fig. 1b), we focused on the CD11b*Ly6G* cells accumulating in 
the early phase of lung colonization. We established mammary gland 
tumours by orthotopic transplantation to synchronize tumour growth, 
distant neutrophil accumulation and metastatic progression (Extended 
Data Fig. 3a). The comparison of tumour-induced CD11b*Ly6G* cells 
and CD11b*Ly6G* neutrophils from healthy lungs revealed minor 
variations, as messenger RNA expression of only two of seven tested 
neutrophil-secreted factors showed changes (Extended Data Fig. 3b). 
Tumour-mobilized lung neutrophils appeared morphologically mature 
(Fig. 1c) and the upregulation of CD31 suggests increased lung infil- 
tration!” (Extended Data Fig. 3b). Together, these data indicate that, 
at this time point, tumour-induced CD11b*Ly6G* cells in the lung 
are mature neutrophils similar to the ones found in healthy lungs. 
As neutrophils in the tumour context are reported to act as myeloid- 
derived suppressor cells'*, we investigated the presence of an anticancer 
immune environment within the pre-metastatic lung of immune- 
competent mice. We used anti-Ly6G blocking antibody to deplete 
neutrophils during the pre-metastatic stage (Extended Data Fig. 4a). No 
significant differences were found in the frequencies and activation of 
various immune components as a consequence of neutrophil depletion, 
in particular in cytotoxic T and NK cells (Extended Data Figs 4b-0 and 
5a-i). To explore further the functional contribution of lung neutro- 
phils to metastasis independently of potential immunosuppression, 
we performed time-controlled neutrophil depletion with anti-Ly6G 
antibody in immune-compromised mice (Rag1-null) harbouring pri- 
mary tumours. Remarkably, pre-metastatic neutrophil depletion during 
metastatic colonization caused a decrease of spontaneous metastasis 
(Fig. 1h-j, 1). Concomitantly, lungs of the same mice were synchro- 
nously seeded with cancer cells isolated from MMTV-PyMT* actin- 
green fluorescent protein (GFP) tumours by intravenous injection to 
initiate lung colonization (Fig. 1h). Notably, GFP* cancer cells coloniz- 
ing neutrophil-depleted lungs were significantly reduced, revealing the 
relevance of lung neutrophils specifically during metastatic initiation 
(Fig. 1k, I). No alterations were found in the extravasation efficiency 
of labelled cancer cells (data not shown). Although we cannot exclude 
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Figure 1 | Neutrophils infiltrate pre-metastatic lungs and favour 
metastasis. a, b, Analysis of wild-type (WT) or MMTV-PyMT* mice. 

a, Lung neutrophils frequencies determined by flow cytometry (n=5 (wild 
type), n= 4 (pre-metastatic lung), n = 4 (metastatic lung)). Met., metastatic. 
b, Lung neutrophils or cancer cells determined by histology staining for 
$100A9 or PyMT (brown). Scale bars, 100 jum. Magnifications in inserts. 

c, Haematoxylin & eosin (H&E)-stained neutrophil. Scale bar, 5\1m. 

d, Lung neutrophil quantification by flow cytometry (n=5 (wild type), n=4 
(PyMT* Gesft/*), n=7 (PyMT* Gesf~'~)). e, f, Spontaneous metastasis 

of MMTV-PyMT* Gesf*!* (n=13) or MMTV-PyMT* Gesf~/~ (n= 24) 


a contribution of other cells to a favourable pre-metastatic environ- 
ment®’, such as monocytes", these results reveal that the breast- 
tumour-induced systemic accumulation of neutrophils coincidentally acts 
as a pre-metastatic niche in tissue targeted for metastatic dissemination. 

Next, we investigated a potential direct effect of neutrophil-secreted 
factors on tumour cells. Pre-metastatic lung neutrophils (Extended 
Data Fig. 6a, b) were used to condition cell culture medium for 14h 
(LuN medium). Primary MMTV-PyMT tumour cells cultured in LuN 
medium in non-adherent culture showed enhanced sphere growth 
(Fig. 2a, b). Furthermore, short-term exposure to LuN medium in 
adherent culture boosted the tumorigenic potential of cancer cells in 
vivo and in vitro (Fig. 2c, d and Extended Data Fig. 6c, d). Importantly, 
short-term culture in LuN medium also increased the metastatic initi- 
ation potential of total cancer cells (Fig. 2e, f). 

Cancer cells are also heterogeneous when disseminated into 
the circulation’® and might respond differently to environmental 
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(e) and MMTV-PyMT* control (n= 14) or MMTV-PyMT*Ela2-Cre- 
DTA* (n=6) mice (f). g, Representative H&E-stained sections of lung. 
Scale bar, 500j1m. h, Experimental setup for neutrophil depletion. i, Flow 
cytometric lung neutrophil quantification (n = 4 (tumour-free), n= 12 (IgG 
tumour), n= 11 (Ly6G tumour)). j, k, Spontaneous (n = 8 per group) (j) and 
experimental metastasis (m= 12 per group) (k). Lin, CD45,CD31,TER119. 

1, Histological GFP-stained lung sections including close-up on spontaneous 
(arrow) and experimental metastases (brown). Scale bar, 500 1m. Statistical 
analysis by two-sided t-test. Data are represented as mean + standard error 
of the mean (s.e.m.). *P < 0.05, **P < 0.01, ***P < 0.001. 


stimulations!®. We therefore probed whether neutrophil-secreted fac- 
tors influence the relative amount of highly metastatic cells. We moni- 
tored the previously described MIC population (CD24*CD90*)° after 
exposing tumour cells seeded into the lung to either LuN medium or 
freshly isolated pre-metastatic lung neutrophils (Fig. 2g). Notably, both 
settings induced a doubling of MIC frequencies among the total cancer 
cell population (Fig. 2h, i and Extended Data Fig. 6e-h) and partially 
increased metastatic growth (Extended Data Fig. 6i-k). Collectively, we 
observe that neutrophil-derived factors alter the heterogeneity of cancer 
cells favouring MICs and lead to increased metastatic competence of 
total cancer cells (Fig. 2)). 

We aimed to identify neutrophil-secreted factors mediating this 
activity. LuN medium contains many factors (data not shown) includ- 
ing CCL2, MMP%, interleukin (IL)-6 and IL-1 that might alter inflam- 
matory responses and increase pro-tumorigenic behaviour'”~'°. Various 
cells in the tumour microenvironment can secrete these mediators, 
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Figure 2 | Neutrophil-derived signals promote tumorigenicity and 
increase the metastatic cell sub-pool. a, b, Images and quantification 
(technical replicate n = 14 (control), n =9 (LuN) of biological 
triplicates) of primary MMTV-PyMT spheres in indicated medium. 
SFI, sphere formation index. Scale bar, 10|1m. c-f, Medium pre-treated 
luciferase* MMTV-PyMT cells (c) grafted onto the mammary gland 

(d) or intravenously injected (e, f) into Rag1-null mice. Lung metastases 
quantified by histological sectioning (n=5 (control), n =4 (LuN)). 


so we concentrated on specific innate leukocyte-derived factors. We 
detected high levels of the lipids leukotriene B4 (LTB4) and cystei- 
nyl leukotrienes C4, D4 and E4 (LT'C/D/E4), products of the Alox5 
enzyme” (Fig. 3a—c). Importantly, direct leukotriene (LT) stimulation 
boosted sphere formation and a short 3-day LT exposure of total can- 
cer cells enhanced their tumour initiation potential (Extended Data 
Fig. 7a—c). Notably, cells expressing LT receptors (LTRs; LTB4 recep- 
tor 2 (BLT2) and LTC/E/D4 receptor 2 (CysLT2))*!”” appeared to be 
enriched among MICs within total MMTV-PyMT cancer cells as well 
as among other known tumorigenic subpopulations of breast cancer 
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f, Representative bioluminescence signal. g, Experimental setup. h, i, Flow 
cytometric quantification of MICs in lungs of LuN-treated (n= 3 (PyMT 
control), n= 4 (PyMT + LuN)) (h) or neutrophil-treated mice (n =3 
(PyMT control) n=4 (PyMT + neutrophils) (i). j, Representation of cell 
heterogeneity change. Statistical analysis by two-sided t-test (b), Mann- 
Whitney test (e) and one representative experiment of two analysed by 
analysis of variance (ANOVA) (h, i). Data are represented as mean +s.e.m. 
*P<0.05, **P<0.01. 


cell lines?*-?° (Fig. 3d, e and Extended Data Fig. 7d-i). Indeed, LTRs 
themselves identified MMTV-PyMT cancer cells with high sphere and 
tumour formation abilities (Extended Data Fig. 7j-). 

In accordance with LTR expression on MICs, we found that 3-day 
LT stimulation of MMTV-PyMT tumour cells in vitro increased MIC 
frequency and metastatic initiation capacity in vivo (Fig. 3f-h), sim- 
ilar to neutrophil-derived mediators (Fig. 2e-j). LT stimulation also 
enriched the CD49f"8" sub-pool among 4T1 cells (Extended Data 
Fig. 8b). Other cells such as macrophages and eosinophils respond to 
LTs, but no broader inflammatory reaction was detected at this stage 
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Figure 3 | LTs enrich for MICs and tumorigenicity. a, b, Enzyme 
immunoassay detecting LTB4 (n= 4 per group) (a) or LTC/D/E4 (n=2 per 
group) (b). c, Overview of LTs and LTRs. d, e, Flow cytometric quantification 
of BLT2* (n= 4 tumours) (d) and CysLT2* cells (n= 2 tumours) (e) among 
indicated sub-pools. f-h, Representation of LT treatment (f): frequency of 
MICs (n=8 per group) (g); and experimental lung metastasis (n = 6 per 
group) with representative images of GFP* colonies (h). Scale bar, 3 mm. 
Lin, CD45,CD31,TER119. i, Western blot of ERK1/2 phosphorylation and 
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total ERK1/2 levels of LTB4- or LTC/D/E4-treated cells for indicated minutes. 
Loading control: anti-vinculin antibody. j-k, 5-Bromodeoxyuridine (BrdU) 
incorporation comparing LT-treated MICs with non-MICs (n=3 (non-MICs), 
n=A4 (MICs)) (j) or MICs treated with LTs and/or PD0325901 MEK inhibitor 
(MEKi; n=3 per group) (k) DMSO, dimethylsulfoxide treated; EtOH, ethanol 
treated. Statistical analysis by two-sided t-test (a, d, h, j,k) and one-sample 
t-test (g). Data are represented as mean +s.e.m. NS, not significant. *P < 0.05, 
**P < 0.01. Blot source data are in Supplementary Fig. 1. 
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Figure 4 | Alox5 inhibition decreases lung metastasis initiation. 

a, b, Alox5-null bone marrow (BM) chimaera experimental setup (a) and 
spontaneous metastasis (n = 6 (wild-type bone marrow), n=9 (Alox5~/~ 
bone marrow)) (b). WT, wild type. c, Surface metastases of medium 
pre-treated cancer cells (n = 10 (control), n=9 (LuN wild type), n=5 
(LuN Alox5ko), 1 =3 (LuN-Zil)). d, Experimental setup for Zil treatment. 
e-k, Spontaneous (e) and experimental (f, i, k) metastasis of MMTV- 
PyMT cells (n=9 (PyYMT DMSO), n=7 (PyMT Zil)) (e-g), 4T1 cells 
(n=5 (4T1 DMSO), n=7 (4T1 Zil)) (h, i) or MDA-MB-231 cells (n= 6 


(Extended Data Figs 4 and 5). In summary, LTs appear to shift hetero- 
geneous cancer cell populations in favour of highly metastatic cells and 
enhance metastatic competence. 

In line with previous reports on LTB4 signalling*"*®, cancer cells 
responded to both LTB4 and LTC/D/E4 with increases in extracellular- 
signal-regulated kinases (ERK)1 and 2 phosphorylation (Fig. 3i 
and Extended Data Fig. 8c, d). LTR* cells were required to detect a 
LT-dependent phosphorylated (p)ERK1/2 increase (Extended Data 
Fig. 8e-g) and inhibitors for BLT2 and CysLT2 interfered with ERK1/2 
activation (Extended Data Fig. 8h-k). Finally, 3-day LTC/D/E4 treat- 
ment increased the frequency of LTR* cancer cells, suggesting a 
functional boost in proliferation (Extended Data Fig. 81). Indeed, LT 
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spontaneous (arrows) and experimental metastases (brown) (g) or H&E 
stained (h, k). Scale bars, 500 1m. I-o, BLT2 (1, m) or CysLT2 (n, 0) staining 
(brown) of human breast cancer and matched lymph node (LN) metastases 
(n> 30 per group). Quantification of staining intensity and frequency (1, n) 
and representative images (m, 0). Scale bar, 501m. Statistical analysis by 
two-sided t-test (b, e, f, i), Mann-Whitney test (c) and one-sided t-test (k). Data 
are represented as mean +s.e.m. NS, not significant, *P< 0,05, ***P< 0.001, 


treatment specifically increased the proliferation of MICs in a MAPK/ 
ERK kinases (MEK) 1- and 2-mediated, pERK1/2-dependent manner 
(Fig. 3j, k and Extended Data Fig. 8m). These results indicate that LTs 
provide a selective proliferative advantage to cancer cells with intrinsi- 
cally higher tumorigenicity (Extended Data Fig. 8a). 

To confirm the functional relevance of LTs in vivo, we took advan- 
tage of an Alox5-null mouse model (Fig. 3c). We generated bone 
marrow chimaeric mice in which Alox5 is genetically depleted in 
the radiosensitive immune cell compartment. Bone marrow Alox5- 
null mice grafted with MMTV-PyMT cells showed unaltered pri- 
mary tumour growth and neutrophil lung accumulation (Fig. 4a 
and Extended Data Fig. 9a-d), yet the efficiency of spontaneous 
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metastasis was reduced (Fig. 4b). Next, we generated LT-deficient 
LuN (LuN-Alox5ko) medium from Alox5-null pre-metastatic lung 
neutrophils. Importantly, LuN-Alox5ko medium failed to boost the 
metastatic potential of luciferase-expressing MMTV-PyMT cells after 
3-day pre-treatment (Figs 2c, 4c and Extended Data Fig. 9e, f). Taken 
together, these data confirm Alox5 products to be crucial for neutrophil 
pro-metastatic activity. 

LTs are important mediators during inflammatory asthma and are 
targeted by the specific Alox5 inhibitor zileuton (Zil)?’. We explored 
Zil-mediated inhibition of LT synthesis to treat metastatic breast can- 
cer in mice. Zil blocked LT production in vivo, detected by decreased 
LTB4 levels in LuN medium (LuN-Zil) (Extended Data Fig. 10a, b) and, 
consequently, LuN-Zil medium failed to enhance metastasis (Fig. 4c). 
Importantly, in a therapeutic setting (Fig. 4d), treatment of MMTV- 
PyMT tumour-harbouring mice with Zil reduced spontaneous metas- 
tasis (Fig. 4e, g), without altering primary tumours or lung neutrophil 
levels (Extended Data Fig. 10c, d). Additionally, the colonization capac- 
ity of GEP* MMTV-PyMT cancer cells seeded into lungs of Zil-treated 
mice was reduced (Fig. 4f, g). We confirmed that metastatic cancer cells 
showed reduced proliferation very early after infiltrating Zil-treated 
lungs (Extended Data Fig. 10e). Taken together, these data represent a 
potential therapeutic approach to target this novel LT/Alox5-dependent 
neutrophil pro-metastatic activity. 

Importantly, similar results on the efficacy of Zil treatment in lim- 
iting metastatic progression were confirmed in two metastatic breast 
cancer cell lines, mouse 4T1 cells and human MDA-MB-231 cells 
(Fig. 4h-k and Extended Data Fig. 10f-i). As Zil treatment had no 
effect on long-term primary tumour growth in vivo or on cancer cell 
behaviour in vitro (Extended Data Fig. 10j-m), we exclude involvement 
of Alox5 products in a cancer-cell autocrine loop. 

Clinical data correlating high neutrophil levels with poorer 
prognosis”®”°, together with detected LTR expression in human met- 
astatic ductal and lobular breast carcinoma and their lymph-node 
metastases (Fig. 41-0), suggests that a similar neutrophil pro-metastatic 
mechanism might boost human breast cancer progression to the lung. 

We have identified a novel LT/Alox5-dependent pro-metastatic 
activity of neutrophils supporting highly metastatic cells that can be 
targeted by Zil, offering hope for new cancer therapeutics. 


Online Content Methods, along with any additional Extended Data display items and 
Source Data, are available in the online version of the paper; references unique to 
these sections appear only in the online paper. 
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METHODS 

Mouse strains. The MMTV-PyMT mice were a gift from E. Sahai, MMT V-PyMT 
actin-GFP (mice expressing green fluorescent protein under the control of the actin 
promoter), Gcsf-null and RagI-null mice were a gift from J. Huelsken, MMTV- 
PyMT actin-luciferase (mice expressing firefly luciferase under the control of the 
actin promoter) transgenic line was a gift from D. Bonnet, Rosa26R-eGFP-DTA 
mice were a gift from C. Reis e Sousa. Ela2-Cre knock-in mice and Alox5-null mice 
were purchased from European Mouse Mutant Archive (EMMA) and Jackson 
Laboratory, respectively. All mouse strains have been described previously*”*”. 
All strains of mice were in >10 generations FVB/N and/or C57BL/6 background 
except Gcsf-null, Ela2-Cre and Rosa26R-eGFP-DTA mice that were used in mixed 
background with littermate controls. Female mice were used between 6-9 weeks 
of age, except spontaneous cancer models. Breeding and all animal procedures 
were performed in accordance with UK Home Office regulations under project 
license PPL/80/2531. 

Mouse experiments. Where applicable, mice were anaesthetized with IsoFlo 
(isoflurane, Abbott Animal Health) and temporally treated with the analgesics 
Vetergesic (Alstoe Animal Health) and/or Rimadyl (Pfizer Animal Health). For 
tumour studies under the project licence PPL/80/2531, the overruling determinant 
was animal welfare. The National Cancer Research Institute (NCRI) Guidelines for 
the Welfare and Use of Animals in Cancer Research were followed. When assessing 
primary tumour growth, a mean diameter of 1.5 cm for single tumours was not 
exceeded. However, for multifocal disease such as MMTV-PyMT cancer, provided 
that there were no additional adverse welfare consequences for the animal, the total 
superficial tumour burden was allowed to exceed these dimensions when essential 
for the achievement of the scientific objective, namely spontaneous metastasis. 
Mice were monitored daily for signs of adverse effects. The source data for primary 
tumour growth are in Supplementary Fig. 3. 

Tumour cell transplantations and induction of experimental metastasis. 
FVB/N wild-type mice were used for MMTV-PyMT tumour cell transplanta- 
tions to isolate lung neutrophils. Rag1-null mice were used when using human 
or mouse GFP or luciferase-expressing tumour cells. Primary MMTV-PyMT, 
MMTV-PyMT actin-GFP or MMTV-PyMT actin-luciferase cells (10°-10° cells 
per injection), the unmarked or stably mouse phosphoglycerate kinase 1 (PGK) 
promoter-GFP-expressing mouse mammary cancer cell line 4T1 (10° cells per 
injection) and the unmarked or stably actin-GFP-expressing human breast cancer 
cell line MDA-MB-231 (1-2 x 10° cells per injection) were used. For experimental 
metastasis, tumour cells were re-suspended in 100,11 PBS and tail vein injected. 
For orthotopic transplantations, tumour cells were re-suspended in 5011 growth- 
factor-reduced Matrigel (Costar) and transplanted within the fourth mammary 
fat pad on both flanks (MMTV-PyMT and MDA-MB-231 cells) or one flank only 
(4T1 cells). 

Neutrophilia and lung immune cell infiltration in MMTV-PyMT* mice. 
MMTV-PyMT* mice that spontaneously developed a primary tumour and had 
visible lung metastasis were used to determine immune cell presence in the lung 
and neutrophil presence in other organs together with tumour-free littermate con- 
trols. For determination of timing and dynamics of lung infiltration by neutrophils 
and cancer cells, MMTV-PyMT* mice harbouring 1.5-2 g spontaneously devel- 
oped tumours were used. Neutrophil infiltration was quantified by flow cytometry 
and histological staining of lung sections for S100A9 and cancer cell presence by 
examination of six histological lung sections (100,.m apart) for PyMT staining 
to confirm the pre-metastatic state. The timing of neutrophil infiltration into the 
pre-metastatic lung before cancer cells was confirmed in FVB/N wild-type mice 
carrying two primary tumours originating from orthotopic injection of primary 
MMTV-PyMT cancer cells and used for analysis (daily treated with anti-Ly6G or 
control IgG antibody starting 24 h before tumour cell implantation). 

Analysis of MMTV-PyMT* G-CSF and MMTV-PyMT* Ela2-DTA mice. Mice 
were culled and analysed about 6 weeks after spontaneous primary tumour onset; 
no differences were observed in tumour onset among the different genotypes. 
Treatments with neutrophil-blocking antibody anti-Ly6G or Zil. Rat anti-Ly6G 
antibody**? (12.5 1g per mouse; clone 1A8 from BioXcell) or rat IgG isotype con- 
trol (provided by the Cell Services Unit of The Crick Institute) in 100 11 saline were 
administered daily via intraperitoneal injection. Zil (LKT Laboratories) dissolved 
in DMSO (Sigma) or DMSO alone was fed to mice by pipetting on the back of the 
tongue once a day at a dosage of 100\1g Zil per g mouse weight. 

Lung colonization by cancer cells after neutrophil depletion or Zil treatment. 
Rag1-null mice were orthotopically transplanted with unlabelled mammary 
tumour cells 4 weeks before labelled tumour cell injection via the tail vein (MMTV- 
PyMT and 4T1 10° cells, MDA-MB-231 10° cells). Anti-Ly6G or Zil treatment for 
2 weeks (except 4T1, 10 days) started 1 day before intravenous injection of cancer 
cells. Then, total primary tumour burden, neutrophil presence in the lung, spon- 
taneous lung metastasis incidence from the transplanted primary tumour and/or 


experimentally induced lung metastasis originating from the intravenously injected 
cancer cells was analysed. 

Of note, exclusively experimental metastasis are present in lung harbouring 
MDA-MB-231 cells, while predominantly spontaneous metastases are visible in 
lung harbouring 4T1 cells due to the high spontaneous metastasis rate of primary 
4T1 tumours. Only GFP* experimental metastasis induced by cancer cell injection 
was quantified in these experiments. 

Tumoutr/metastasis initiation potential assay in vivo. Primary MMTV-PyMT 
cells were either cell sorted for BLT2 and/or CysLT2 presence or absence, or treated 
for 3 days on collagen-coated dishes with either neutrophil-conditioned medium or 
LTB4 and LTC/D/E4. Subsequently, 10°-104 cells were orthotopically transplanted 
into the mammary gland or 10° cells injected via the tail vein into Rag1-null mice 
and mammary tumour growth or lung metastasis incidence analysed about 3 weeks 
thereafter. 

MICs or metastasis quantification after neutrophil/LuN injection. To analyse 
total cancer cells at early stages, Rag1-null mice were injected with 0.5-1 x 10° 
MMTV-PyMT actin-GFP cells via the tail vein followed 12h later by intrave- 
nous injection of 25 x 10° neutrophils (freshly isolated from MMTV-PyMT 
tumour-transplanted mice) or 12, 24 and 36h later by intravenous injection 
of 20011 lung neutrophil-conditioned or control sphere medium (described 
later). Cancer cells in the lung were analysed 3 days after the initial tumour 
cell injection for frequencies of CD90* MICs among GFP*CD24* (non- 
MIC) cancer cells. For determination of effects of neutrophils or neutrophil- 
conditioned medium on metastatic burden, Rag1-null mice were intravenously 
injected with 1-10 x 10° MMTV-PyMT actin-GFP or actin-luciferase cells 
followed immediately, 2 and 4 days later, by injection of 25 x 10° neutrophils or 
3-5 times every 12h by injection of 2001] lung neutrophil-conditioned medium. 
Metastatic burden was determined by flow cytometric analysis of GFP* cancer 
cells 1 week or bioluminescence imaging of luciferase cancer cells 2-4 weeks 
thereafter, respectively. 

Analysis of functional effects of G-CSF deficiency in MMTV-PyMT cancer cells. 
Rag1-null mice were transplanted with 10° Gcsf-null primary MMTV-PyMT can- 
cer cells into two mammary glands and tumour growth, spontaneous metastatic 
incidence and neutrophil presence in the lung were analysed 4 weeks thereafter. 
Bone marrow transplantation and semi-quantitative PCR. C57BL/6 wild-type 
mice were lethally irradiated (dosage: 2x 600 rad, 4h apart) and 24h later injected 
via the tail vein with 2 x 10° bone marrow cells freshly isolated from C57BL/6 
or Alox5-null donor mice. Bone marrow chimaeric mice were orthotopically 
transplanted with 10° MMTV-PyMT cells into the fourth mammary fat pad on 
both sides 8 weeks after bone marrow reconstitution and primary tumour size, 
neutrophil infiltration into the lung and lung metastasis were analysed 6 weeks 
later. Chimaeric mice were generated in a pure C57BL/6 background, therefore 
MMTV-PyMT cells from the same background were used to generate primary 
tumours. In this lower tumorigenic background, metastasis only occurs in about 
50% of the mice. No alteration in this penetrance was observed between wild- 
type and Alox5-null bone marrow chimaeric mice. Figure 4b quantifies animals 
harbouring metastatic disease. 

Percentage of bone marrow reconstitution was calculated by isolating total DNA 
from bone marrow of chimaeras and semi-quantitative PCR with a calibration 
curve from 100% wild-type DNA mixed at defined ratios with 100% Alox5-null 
DNA. PCR was performed using Redtag reagents (Sigma) (primers are listed in 
Supplementary Information) and 25 amplification cycles before loading an agarose 
gel. Ratio between wild-type and Alox5-null band was calculated for every mouse 
and percentage chimaerism was determined by comparison with calibration curve. 
Chimaerism was consistently between 80 and 96%. 

Tumour and metastasis burden evaluation. See Supplementary Methods. 

In vivo luciferase-activity detection. Mice inoculated with actin-luciferase- 
expressing MMTV-PyMT cells were shaved around the chest area and injected 
with 3 mg XenoLight p-luciferin potassium salt (PerkinElmer) in PBS into the 
peritoneum 5 min before imaging for at least 45 min using the IVIS Spectrum Pre- 
clinical In vivo Imaging System (PerkinElmer). The maximum bioluminescence 
intensity signal for the lung of every mouse was determined using Living Image 
4.3.1 software. 

Tissue staining, immunohistochemistry and light microscopy. Mouse lung 
tissue was fixed in 4% paraformaldehyde in PBS for 24h and embedded in paraf- 
fin blocks. Four-micrometre sections were stained. The breast cancer tissue array 
paired with metastatic tumours, 96 samples (1.5mm), was purchased from Abcam 
(ab178118). H&E staining was performed according to standard procedures. 

For immunohistochemistry, either secondary horseradish peroxidase (HRP)- 
conjugated antibodies were used in combination with DAP Peroxidase substrate 
or the VECTASTAIN ABC kit (all Vector Laboratories) according to the manu- 
facturer’s instructions. Specific primary antibodies were used (see Supplementary 
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Information), visualization of cell nuclei was performed with haematoxylin and 
analysis employed the Nikon Eclipse 90i light microscope and NIS-elements 
software. 

Scoring of LTR expression in breast cancer tissue and lymph node metastasis. 
See Supplementary Methods. 

Tissue digestion for cell isolation or analysis. MMTV-PyMT cell isolation was 
described in detail previously*. In brief, primary MMTV-PyMT tumours, liver, 
spleen and lung were dissected, minced, and digested with Liberase (Roche) and 
DNasel (Sigma) in HBSS and passed through a 100,.m cell strainer. Some tumour 
cells were used for cell culture at this point. Bone marrow cells were isolated by 
crushing the femur and tibia and blood collected via bleeding from the tail vein 
with heparin (Sigma) as a coagulant. For flow cytometric analysis or further puri- 
fication, single-cell suspensions of tumour, liver, spleen, lung, bone marrow and 
blood were subjected to hypotonic lysis (Red Blood Cell Lysis Solution, Miltenyi) 
to remove erythrocytes and washed with 1 x PBS/2mM EDTA/0.5% BSA. 

Flow cytometry and cell sorting. Prepared single-cell suspensions of mouse tissues 
and in vitro treated cancer cells were incubated with mouse FcR Blocking Reagent 
(Miltenyi) followed by incubation with (a combination) of specific pre-labelled 
antibodies or in combination with fluorescently labelled secondary antibodies 
(Invitrogen) (see Supplementary Information). Dead cells were stained with 
4',6-diamidino-2-phenylindole (DAPI) or propidium iodide (PI; both Sigma). 
The LSRFortessa cell analyser running FACSDiva software (BD Biosciences) and 
FlowJo software was used. Tumour cells were flow-sorted using the Influx cell 
sorter running FACS Sortware sorter software (BD Biosciences). MMT V-PyMT 
cells were used in experiments immediately after sorting and sorted 4T1 cells cul- 
tured in adherent conditions for 3 days before western blot analysis. 

Neutrophil isolation and neutrophil-conditioned medium. Freshly isolated lung 
cells from wild-type mice orthotopically transplanted with MMTV-PyMT tumours 
were incubated with mouse FcR Blocking Reagent (Miltenyi), APC-coupled anti- 
Ly6G (clone 1A8) antibody (BD Bioscience) followed by incubation with mag- 
netic anti- APC microbeads (Miltenyi). Magnetically labelled neutrophils were 
isolated using LS columns (Miltenyi) according to the manufacturer’s instructions. 
Neutrophil purity and viability was measured by flow cytometry. Some isolated 
Ly6G* cells were smeared onto a glass slide and air-dried overnight followed by 
H&E staining to evaluate cell morphology. Remaining neutrophils were kept in 
sphere medium at a concentration of 10° neutrophils per 150411 medium for 14h 
to allow conditioning. Neutrophils and cell debris were removed by centrifugation 
and conditioned medium occasionally snap-frozen before use. 

Cell culture and in vitro cancer cell treatments. All used cell lines were pro- 
vided by the Cell Services Unit of The Crick Institute, which routinely tests for 
Mycoplasma contamination and were not further authenticated in our laboratory. 
Cell lines were cultured in DMEM medium supplemented with 10% fetal bovine 
serum (DMEM/EFCS, both Invitrogen). Freshly isolated MMTV-PyMT cells were 
cultured overnight on PureCol collagen (Advanced Biomatrix)-coated dishes 
in growth medium DMEM/F12 with 2% FBS, 20 ng ml! EGF (Invitrogen) and 
101g ml”! insulin (Sigma) before use in experiments. All in vitro and in vivo 
experiments involving primary MMTV-PyMT cells were performed with at least 
two primary tumour cell preparations from different spontaneous MMTV-PyMT* 
mice. Unless otherwise specified, each in vitro and in vivo experiment was per- 
formed with a different tumour cell preparation. 

Primary MMTV-PyMT cells were cultured in sphere medium on collagen- 
coated dishes, 4T 1 and MDA-MB-23]1 cells in DMEM/EFCS on uncoated dishes or 
in non-attachment conditions for the indicated periods of time under presence 
of (as indicated for every experiment): control sphere medium, neutrophil- 
conditioned medium, 100% ethanol control (EtOH, Sigma), DMSO control, 11M 
LTB4, 100nM LTC/D/E4 (Cysteinyl Leukotriene HPLC Mixture I), 3,.M BLT2 
inhibitor LY255283, 0.3 1M CysLT2 inhibitor BAY-u9773 (all Cayman Chemical), 
11M Zil and/or 1 nM pan-MEK inhibitor PD0325901 (provided by J. Downwards) 
followed by further tests or analysis. 

Sphere formation assay. The sphere formation assay was described previously*. 
In brief, 10* total MMTV-PyMT or flow-sorted cells per well were plated in 
ultra-low-attachment 96-well plates (Costar) in 100,11 sphere medium DMEM/ 
F12 supplemented with B-27, 20ng ml”! EGE, 20ng ml’ FGF (all Invitrogen) and 
4yg ml! heparin (Sigma) or neutrophil-conditioned medium. After 7-10 days, 
if not otherwise indicated, all formed spheres were quantified from images taken 
with the inverted Leica DM IRBE light and fluorescence microscope. The area of 
the plane passing through the sphere centre was measured for every sphere (sphere 
size) using ImageJ software and the areas of all formed spheres were summed 
up. The obtained number was divided by total number of plated cells. This value 
represents the sphere formation index (SFI) per cell for every experimental group. 
Freshly isolated MMTV-PyMT cells were either only treated for 3 days in adherent 
conditions before sphere assay or directly treated during the sphere assay with 
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neutrophil-conditioned medium or LTB4 and/or LTC/D/E4 or Zil, as indicated. 
When cells were passaged, cells were quantified by cell counting and re-plated 
in equal numbers per well for the next passage approximately every 7 to 10 days. 
In vitro and in vivo BrdU incorporation assay. Rag1-null mice carrying MMTV- 
PyMT tumours were treated daily for 3 days with Zil and intravenously injected 
with 10° GFP-expressing MMTV-PyMT cancer cells. BrdU (1 mg per mouse) was 
intraperitoneally injected 18h after GFP* cancer cells and lungs were harvested 
and digested 6h later. In vitro 3-day MMTV-PyMT or 4T1 cells treated as indicated 
in adherent conditions were pulsed with 301M BrdU (Sigma) for 3 h and harvested. 
Cells were incubated with fluorescently labelled anti-CD24 and/or anti-CD90.1 
antibody if indicated. BrdU Flow Kit (BD Bioscience) was used for staining 
followed by analysis by flow cytometry. 
In vitro quantification of primary MICs and sub-pools of cancer cell lines. 
Primary MMTV-PyMT cells were cultured on collagen-coated dishes for 3 days 
supplemented with either LTB4 and LTC/D/E4 or Zil followed by incubation 
with fluorescently labelled anti-CD90.1 and anti-CD24 antibodies and analysis 
by flow cytometry. 4T1 and MDA-MB-231 cell lines were cultured in DMEM/ 
FCS supplemented with LTB4 and LTC/D/E4 for 3 days in adherent conditions 
followed by either staining with fluorescently labelled anti-CD49f, anti-BLT2, anti- 
CysLT2 and/or anti-CD44 antibodies or using the ALDEFLUOR kit (StemCell 
Technologies) according to the manufacturer's instructions and analysed by flow 
cytometry. 
RNA expression/quantitative real-time PCR. Neutrophils were freshly isolated 
from the lungs of wild-type or MMT V-PyMT tumour-bearing mice. RNA iso- 
lation was performed using MagMAX-96 Total RNA Isolation Kit and cDNA 
synthesis using SuperScript III Reverse Transcriptase. Quantitative PCR reac- 
tions were performed using EXPRESS SYBR GreenER reagents with the Applied 
Biosystems 7500 Fast Real-Time PCR System (all Invitrogen) and specific primers 
(see Supplementary Information). 
Enzyme immunoassay and parameter enzyme-linked immunosorbent assay. 
Ethanol was used to precipitate protein from cell culture medium before analysis 
using either the enzyme immunoassays (EIAs) LTC/D/E4 Biotrak EIA System 
(Amersham) or the LTB4 EIA Kit (Cayman Chemical) according to the manu- 
facturer’s instructions. 
Western blot analysis and protein detection. Primary MMTV-PyMT cells grown 
on collagen-coated dishes were cultured overnight in DMEM/F12 with B-27, and 
41g ml! heparin (Sigma) before treatment with 1 1M LTB4 or 100nM LTC/D/E4. 
Unsorted or sorted LTR-reduced 4T1 cells were stimulated with LTB4, LTC/D/E4, 
BLT2 inhibitor LY255283 and/or CysLT2 inhibitor BAY-u9773 as indicated. Cells 
were washed and protein isolated using RIPA buffer (25 mM Tris-hydrogen chlo- 
ride pH 7.6, 50 mM sodium chloride, 1% NP-40, 1% sodium deoxycholate, 0.1% 
soduim dodecyl sulfate) freshly supplemented with 1 |1M sodium pyrophosphate, 
14M B-glycerophosphate, 1|1M sodium vanadiumoxide, 11M sodium fluoride, 
141M sodium molybdate (all Sigma) and cOmplete ULTRA Tablets (Roche), and 
processed by standard western blot techniques. Membranes were blocked with 
5%BSA in PBS with 0.5% Tween-20 (Sigma) and incubated with specific pri- 
mary antibodies (see Supplementary Information). ECL Western Blotting System 
including secondary antibodies and Hyperfilm ECL (both Amersham) were used. 
Protein lysates of 3h LTB4-stimulated MDA-MB-231 cells were analysed using the 
Proteome Profiler Human Phospho-Kinase Array Kit (R&D systems) according 
to the manufacturer's instructions. Western blot quantification was performed on 
scanned films using ImageJ software. 
Statistical analysis. Data analyses used GraphPad Prism version 7. The data are 
presented as mean + standard error of the mean, individual values, ‘scatter plot 
with Tukey box and whiskers’ and/or ‘scatter plot with column bar’ graphs and 
were analysed using Student's t-tests (paired or unpaired according to the experi- 
mental setting), Mann-Whitney tests, one-sample f-tests and two-way ANOVA as 
indicated in the legends. Data were pooled from at least two experiments, except 
Fig. 4c, i, k and Extended Data Figs 2d, 4d-m, 5a, b, f, h, 6k, 10e, in which data 
are at least biological triplicates generated in parallel. Two-way ANOVA was per- 
formed when the control groups between experiments were significantly different. 
Western blot in Extended Data Fig. 8i, k, the proteome profiler dot blot in Extended 
Data Fig. 8d and BrdU incorporation of 4T1 cells in Extended Data Fig. 10k were 
performed once. Extended Data Fig. 3b (mRNA expression) compares biological 
triplicates of the pre-metastatic to a representative control (wild-type) value. 
The experiments were not randomized and there was no blinding as animals or 
samples were marked. No statistical methods were used to predetermine sample 
sizes. Sample sizes were based on previous experience with the models*!". n values 
represent biological replicates, with the exception of the sphere assays, for which 
both technical and biological replicates are shown. 

Differences were considered significant when P < 0.05 and are indicated as NS, 
not significant, *P< 0.05, **P<0.01, ***P<0.001. 
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Systemic increase of neutrophils in MMTV-PyMT* mice: 
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Extended Data Figure 1 | Mammary tumour-bearing MMTV-PyMT* 
mice show specifically neutrophilia in the metastatic lung. a-c, Flow 
cytometric quantification of CD11b+Ly6G* neutrophils in the bone 
marrow (n= 6 (wild type), n=5 (MMTV-PyMT*)) (a), liver (n=4 
(wild type), n=5 (MMTV-PyMT*)) (b) and spleen (n = 6 (wild type), 
n=5 (MMTV-PyMT*)) (c) of wild-type (WT) and tumour-bearing 
MMTV-PyMT* mice. d, Quantification of neutrophils in the tumour 
and metastatic lung of MMTV-PyMT* mice (n= 2 per group), 
pre-metastatic lung neutrophil levels depicted in Fig. 1a are shown 

for comparison in dashed lines. Met., metastatic. e-l, Flow cytometric 
quantification of immune cell frequencies in wild-type and metastatic 
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not otherwise indicated) including CD45* total immune cells (e), total 
CD11b*F4/80* macrophages (f) (n = 4 (wild type), n = 4 (metastatic)), the 
CD11b!* F4/80"8' alveolar macrophage subpopulation (n = 4 (wild type), 
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Data are represented as mean 4 
**P < 0.01, ***P < 0.001. 
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t s.e.m. NS, not significant, *P < 0.05, 
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Extended Data Figure 2 | See next page for caption. 
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Extended Data Figure 2 | Analysis of MMT V-PyMT*Gcsf—'~ mice, 
G-CSF-deficient MMT V-PyMT cancer cells and MMTV-PyMT*Ela2- 
Cre-DTA* mice. a, Representative flow cytometric analysis of 
CD11b*Ly6G* neutrophils in the lung of wild-type and tumour- 
bearing MMTV-PyMT* Gesft/tand MMTV-PyMT* Gesf~/~ mice. 

b, Primary mammary tumour burden of MMTV-PyMT™ Gesf /* 

(n= 13) or MMTV-PyMT* Gesf ~/~ (1 = 24) mice. c, Flow cytometric 
quantification of frequencies of total CD11b*F4/80* macrophages (left), 
the CD11b!°YF4/80"£" alveolar macrophage subpopulation (middle) 
and the CD11b"8"F4/80!™ interstitial macrophage subpopulation 
(right) in the lung of tumour-bearing MMTV-PyMT* Gesf*!* (n= 4) 
and MMTV-PyMT* Gesf ~/~ (1=7) mice. d, MMTV-PyMT* Gesf ~/~ 
primary cancer cells were freshly isolated and grafted onto two mammary 
glands of Rag1-null mice (10° cells per injection) and analysed 5 weeks 
thereafter. CD11b*Ly6G* neutrophil presence in the lung was assessed 
by flow cytometry (left), primary tumour burden was assessed by 
weighing (middle) and spontaneous lung metastasis incidence was 
assessed by quantification of visible surface lung metastases relative to 
tumour load (right) (n = 3 (Gesf*/*), n=4 (Gesf/~)). e-g, Analysis of 
tumour-bearing MMTV-PyMT* control and MMTV-PyMT*Ela2-Cre- 
DTA?t mice. Representative flow cytometric analysis of CD11b*Ly6G* 
neutrophils in the lung (e). Lung neutrophil quantification (n = 8 (wild 
type), n=7 (PyMT+control), n= 5 (PyMT+Ela2-Cre-DTA+)) (f, left) 
and primary mammary tumour burden (n= 14 (PyMT-+ control), n= 6 
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(PyMT-+ Ela2-Cre-DTA+)) (f, right) with representative H&E-stained 
histological lung sections (g). Scale bar, 500 1m. h, Flow cytometric 
quantification of frequencies of total CD11b*F4/80* macrophages (left), 
the CD11b!WF4/80"'8" alveolar macrophage subpopulation (middle) and 
the CD11b'85F4/80!°W interstitial macrophage subpopulation (right) in 
the lung of tumour-bearing MMTV-PyMT* control (n =7) and MMTV- 
PyMT*tEla2-Cre-DTA* (n=5) mice. i, Frequencies of bone marrow (top) 
and blood (bottom) CD11b*Ly6G* neutrophils (left; blood n = 3 (wild 
type), n= 6 (PyMT+control)), CD1 1b*F4/80* macrophages (middle; 
blood n= 3 (wild type), n= 6 (PyMT-+ control)) and CD11b*CD115* 
monocytes (right; blood n = 3 (wild type), n=5 (PyMT-+ control)) 

in wild-type, MMTV-PyMT* control and MMTV-PyMT*Ela2-Cre- 
DTA* mice analysed by flow cytometry (n= 4 (wild type), n=6 
(PyMT-+ control), n = 2 (PyMT-+ Ela2-Cre-DTA-+ ) if not otherwise 
indicated). j, Exclusion of immune responses against DTA expression in 
the bone marrow by analysis of NK-cell (left) and cytotoxic T-cell (right) 
activation. Flow cytometric quantification of activated CD69* among total 
CD45*CD49b* NK cells as well as activated CD44* or CD69* among 
total CD45*CD3*CD8>* cytotoxic T cells in the bone marrow of wild- 
type (n= 4), MMTV-PyMT* control (n =6) and MMTV-PyMTtEla2- 
Cre-DTA* (n=2) mice. Statistical analysis by two-sided t-test. Data are 
represented as mean + s.e.m. NS, not significant, *P < 0.05, **P < 0.01, 
**EP < 0.001. 
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Cytometric analysis: Mean intensity of CD11b+ Ly6G+ cells Significance 
Cell size (FSC) 84295 + 199.0 87223 + 628.1 N=4, P=0.0044 * 
Cell granularity (SSC) 54932 + 1467 62774 + 2020 N=4, P=0.0200 bi 
Surface expression: Percent positive of CD11b+ Ly6G+ cells 
CXCR2+ 97.90 + 0.524 98.45 + 0.1500 N=4, P=0.3522 
CD31+ 31.58 + 3.728 49.45 + 1.900 N=4, P=0.0053 id 
MHC-I+ 30.60 + 1.696 28.77 + 2.381 N=4, P=0.5533 
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ICAM1+ 23.90 + 2.515 23.90 + 2.515 N=3/WT, N=6/Pre-met, P=0.9008 
Fas+ 99.35 + 0.050 99.35 + 0.050 N=2/WT, N=3/Pre-met, P=0.1697 
mRNA expression: Fold-change relative to wildtype Neutrophil purity = 90% 
TNFa. 1 0.9585 + 0.1735 P=0.8334 
Arginase 1 1 0.7667 + 0.1924 P=0.3492 
VEGF-A 1 0.6814 + 0.1494 P=0.1666 
CCL2 1 0.4157 + 0.0932 P=0.0245 3 
CCL3| 1 0.7391 + 0.1584 '=0.2414 
iNOS 1 1.0360 + 0.4653 P=0.9506 
CCL5| 1 0.0517 + 0.0042 P<0.0001 ee 


Extended Data Figure 3 | Comparison of wild-type lung neutrophils 
with tumour-induced, pre-metastatic lung neutrophils. 

a, Representation of timing and dynamics of neutrophil and cancer cell 
infiltration into the lung of mice grafted with two mammary tumours 
by orthotopic injection of 10° MMTV-PyMT tumour cells. b, Flow 
cytometric analysis for cell size (forward scatter (FSC)), granularity 
(side scatter (SSC)) and expression of surface markers CXCR2, CD31, 
MHC-I, MHC-II, ICAM1 and Fas (n is indicated) as well as mRNA 


expression analysis of Tnfa, arginase 1, Vegfa, Ccl2, Ccl3, iNOS (also 
known as Nos2) and Ccl5 by quantitative polymerase chain reaction (PCR) 
of CD11b*Ly6G* wild-type (WT) or pre-metastatic (Pre-met.) lung 
neutrophils 3 weeks after primary tumour graft (n = 3 (pre-metastatic 
compared with one normal lung reference)). Statistical analysis by two- 
sided t-test (flow cytometry) and one-sample t-test (mRNA). Data are 
represented as mean + s.e.m. *P< 0.05, **P< 0.01, ***P< 0.001. 
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Addressing the immunologic response in tumour-bearing mice: 
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Extended Data Figure 4 | See next page for caption. 
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Extended Data Figure 4 | Immune cell frequencies and activation in 
the pre-metastatic lung of MMTV-PyMT tumour-bearing mice is not 
dependent on neutrophil presence (part 1). a, Representation of timing 
and dynamics of neutrophil and cancer cell infiltration into the lung of 
mice grafted with two mammary tumours by orthotopic injection of 10° 
MMTV-PyMT tumour cells. b-o, Flow cytometric quantification and 
representative analysis of the following immune cell types in wild-type 
(WT) or pre-metastatic (Pre-met.) lungs treated daily with either control 
IgG or anti-Ly6G (1A8) neutrophil-blocking antibody from tumour onset 


onwards (n =4 per group if not otherwise indicated): b, f, total CD45* 
immune cells (n= 12 per group); c, g, CD11b*Ly6G* neutrophils (n = 8 
per group); d, g, CD11btSiglecF* eosinophils; e, g, CD1 1b!’ F4/gohish 
alveolar macrophages and CD11b"8"F4/g0!°" interstitial macrophages; 
h, j, CD45+CD11c* dendritic cells; i, k, MHC-II* CD86" activated 
dendritic cells; 1, n, CD45*CD19* B cells; and m, 0, MHC-II*CD86+ 
activated B cells. Statistical analysis by two-sided t-test. Data are 
represented as mean +s.e.m. NS, not significant, *P < 0.05, **P<0.01, 
*EEP < 0.001. 
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Addressing the immunologic response in tumour-bearing mice: 
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Extended Data Figure 5 | Immune cell frequencies and activation in 
the pre-metastatic lung of MMTV-PyMT tumour-bearing mice is 
not dependent on neutrophil presence (part 2). a—i, Flow cytometric 
quantification and representative analysis of the following immune 
cell types in wild-type (WT) or pre-metastatic (Pre-met.) lungs treated 
daily with either control IgG or anti-Ly6G (1A8) neutrophil-blocking 
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antibody from tumour onset onwards (n= 4 per group if not otherwise 
indicated): a, c, CD45+CD49b* NK cells; b, c, CD69* activated NK cells; 
d, e, CD45*CD3*CD8° cytotoxic T cells (n =8 per group); f, g, CD44* 
or CD69* activated T cells; and h, i, the ratio of CD4*CD25* Foxp3? 
regulatory T cells per activated T cell. Statistical analysis by two-sided 
t-test. Data are represented as mean +s.e.m. NS, not significant. 
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Extended Data Figure 6 | Neutrophil isolation from the lung of MMTV- 
PyMT* mice and effect of neutrophil-derived factors on tumour 
formation potential. a, Representative flow cytometric analysis of 
neutrophil purity after isolation from the pre-metastatic lung compared 
to total lung tissue. Only neutrophil purity of >90% was used for further 
experiments. b, Neutrophil viability was assessed by flow cytometry for 
propidium iodide (PI) negativity after isolation (n = 10). c, d, MMTV- 
PyMT cells grown in control or LuN medium for 3 days in adherent 
conditions were plated in non-attachment conditions followed by sphere 
quantification at day 10 post-seeding (technical replicate n = 17 (control), 
n= 21 (LuN) of biological triplicates) (c) or 10 cells grafted onto the 
mammary gland of Rag1-null mice for analysis of tumour formation 
potential (d). Tumour burden was determined by weighing about 3 weeks 
after (n = 12 per group), complementary to Fig. 2d. e-h, Flow cytometric 
quantification of frequencies of total present GFP-labelled MMTV-PyMT 
cells (e, g) and frequencies of CD24*CD90* MICs among total GFP- 
labelled MMTV-PyMT cells (f, h) in the lung of RagI-null mice 3 days 
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after intravenous injection of 5 x 10° total GFP-labelled MMTV-PyMT 
cells followed by either three intravenous injections with control or LuN 
medium (n= 6 (PyMT+ control), n= 8 (PyMT+LuN)) (e, f) or by one 
intravenous injection with 25 x 10° neutrophils freshly isolated from a 
pre-metastatic lung (n =7 (PyMT-+ control), n= 8 (PyMT+ neutrophils) 
(g, h). f, h, Two independent experiments are shown to complement 

Fig. 2h, i. Exp, experiment. i-k, Experimental setup (i): Rag1-null mice 
were intravenously injected with 1-10 x 10° (j) or 0.5 x 10° total GFP- 
labelled MMTV-PyMT cells (k) followed by either 3-5 intravenous 
injections with 200 1l control or LuN medium (j) or by 3 intravenous 
injections with 25 x 10° neutrophils (k) freshly isolated from a 
pre-metastatic lung. Quantification of experimental metastatic incidence 
by determination of bioluminescence intensity (n =7 (control), n=9 
(LuN)) (j) or flow cytometric analysis of GFP* cancer cells in the lung 
(n=5 (control), n = 4 (neutrophil)) (k) is shown. Statistical analysis by 
two-sided f-test (c, j, k) and two-way ANOVA (d-h). Data are represented 
as mean +s.e.m. NS, not significant, *P < 0.05, ** P< 0.01, ***P = 0.001. 
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Extended Data Figure 7 | LTRs are expressed on mouse and human breast 
cancer cells and enriched on metastasis-initiating and highly tumorigenic 
cancer cell sub-pools. a, Sphere formation potential of MMTV-PyMT cells 
under presence of LTB4 or LTC/D/E4 (technical replicate n =8 per group of 
biological triplicates). b, c, Three-day LTB4 and LTC/D/E4-treated MMTV- 
PyMT cells in adherent culture were analysed for primary tumour initiation 
potential by orthotopic transplantation of 10° cells in Rag1-null mice (n= 14 
per group) (b). Exp, experiment; TC, tumour cell isolation. Representative 
image of tumours is shown (c). d, e, Flow cytometric analysis of primary 
MMTV-PyMT cancer cells, the mouse mammary cancer cell line 4T1 and 
the human breast cancer cell line MDA-MB-231 for expression of BLT1 or 
BLT2 (d) as well as CysLT1 or CysLT2 (e). f, Representative flow cytometric 
analysis of BLT2* and CysLT2* cells among MMTV-PyMT non-MICs and 


MICs. g-i, Flow cytometric quantification of LTR expression on Aldefluor 
(ALD)* (n= 3 per group) (g) or CD44" MDA-MB-231 cells (n=4 per 
group) (h) as well as CD49f*/ hish 471 cells (n=4 per group) (i). j-l, Sorted 
LTR* or LTR” MMTV-PyMT tumour cells were plated in non-attachment 
conditions followed by sphere quantification at day 10 post-seeding 
(technical replicate n = 10 per group of biological duplicates) (j) or 10° cells 
grafted onto the mammary gland of Rag1-null mice for analysis of tumour 
formation potential. Tumour burden was determined by weighing (n= 8 
per group) after 3 weeks (k) and representative image of tumours is shown 
(1). Statistical analysis by two-sided t-test (a, h-k) and two-way ANOVA 
(b). Data are represented as mean +s.e.m. NS, not significant; *P < 0.05, 
**P< 0.01, ***P<0.001. 
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Extended Data Figure 8 | LTs promote stemness within the total cancer 
cell population by specifically promoting proliferation of MICs. 

a, In vitro passaging (P indicates passage number) in non-adherent 
conditions of sorted CD24*CD90* MICs and CD24*CD90~ non- 

MICs (n=4 per group for P0+ P1 and n=3 per group for P2+ P3). 
Quantification was performed by determination of percentage of 
remaining cell number after 7-10 days. b, Flow cytometric quantification 
of 3-day LT-treated 4T1 cells for frequency of highly tumorigenic 
CD49f*8 cells (n = 6). c, Quantification of western blots for ERK1/2 
phosphorylation of MMTV-PyMT cells following LTB4 (left) or LTC/D/ 
E4 (right) stimulation relative to «-vinculin as shown in Fig. 3i (n =2 per 
time point except n = 9 (30 min LTB4)). d, Dot blot and quantification of 
ERK1/2 phosphorylation in MDA-MB-231 cells after 3 h stimulation with 
LTB4 measured by R&D Proteome Profiler Human Phospho-Kinase Array 
(ARY003B; one-membrane array). e, Flow cytometric quantification of 
LTR expression of sorted LTR-reduced 4T1 cells (n =3 per group). 

f, g, Representative analysis and quantification of western blots for total 
ERK1/2 and ERK1/2 phosphorylation relative to «-vinculin of unsorted 
4T1 cells or 4T1 cells sorted for LTR negativity (n= 2 per group). 


h-k, Analysis and quantification of western blot for total ERK1/2 and 
ERK1/2 phosphorylation relative to «-vinculin of 4T1 cells following 
LTB4 (h, i) or LTC/D/E4 (j, k) stimulation in the presence of BLT2 
inhibitor LY255283 or CysLT2 inhibitor BAY-u9773, respectively 
(one time series). Dotted lines in indicate the control level of ERK1/2 
phosphorylation. The decrease of ERK1/2 phosphorylation observed after 
5-15 min when adding both leukotrienes and their receptor inhibitors 


is due to the increase in ethanol concentration. 


Data are shown as 


ERK1/2 phosphorylation recovery and increase from 5 to 45 min after 
stimulation (i, k). 1, Flow cytometric quantification of 3-day LTC/D/ 
E4-treated MDA-MB-231 cells for frequency of LTR* cells (n=4 per 
group). m, Three-day LT-treated MMTV-PyMT cells in adherent culture 
were analysed for BrdU incorporation of CD24*CD90~ non-MICs in 
the additional presence of PD0325901 MEK inhibitor (MEKi; n = 3 


per group). DMSO, dimethylsulfoxide treated; 


Statistical analysis by two-sided t-test (1, m), and one-sided t-test (b). Data 
are represented as mean +s.e.m. NS, not significant; *P < 0.05. Blot source 


data are shown in Supplementary Fig. 1. 
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a PCR test for successful Alox5-null chimaeric mice generation 
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Extended Data Figure 9 | Analysis of Alox5-null bone marrow 
chimaeric mice transplanted with primary mammary MMTV-PyMT 
tumours and failure of Alox5-null neutrophils to support cancer cell 


metastatic initiation potential. a, Efficiency of chimaeric mice generation 
was determined by semi-quantitative PCR analysis of DNA isolated from 


the bone marrow of lethally irradiated wild-type mice reconstituted with 
wild-type or Alox5-null bone marrow. A calibration curve of the ratio 
between the PCR band amplified from the wild-type (WT) and Alox5- 
null (KO) allele was used to calculate the percentage of bone marrow 
reconstitution efficiency. Tests of 8 representative Alox5~/~ chimaeric 
mice and 10 controls are shown. Only mice with >80% Alox5-null bone 
marrow reconstitution were used for further experiments. b-d, Analysis 
of wild-type and Alox5-null bone marrow chimaeric mice 1.5 months 


after transplantation with 2 mammary MMTV-PyMT tumours (10° PyMT 


In vivo Metastatic growth 


** f 


LuN-WT LuN-Alox5ko 


18 22 


cells) or tumour-free controls. Representative flow cytometric analysis 
(b) and quantification of CD11b*Ly6G* neutrophil presence in the lung 
(c) (n=4 (wild type), n=4 (Alox5'~),n=5 (wild-type PyMT), n=7 
(Alox5~'~ PyMT) as well as primary mammary tumour burden (1 =6 
(wild-type PyMT), n=9 (Alox5~'~ PyMT)) (d). e, f, 5 x 10° luciferase- 
expressing MMTV-PyMT cells treated with control, wild-type LuN 
(LuN-WT) or Alox5-deficient neutrophil-derived LuN (LuN-Alox5ko) 
medium for 3 days in adherent culture were intravenously injected into 
Rag1-null mice. Quantification of cancer-cell-derived bioluminescence 
in the lung over time (n= 5 (control), n=5 (LUN-WT), n= 4 (LuN- 
Alox5ko)) (e) and representative image is shown (f). Statistical analysis 
by two-sided t-test. Data are represented as mean + s.e.m. *P < 0.05, 
**P < (0.01. Blot source data are shown in Supplementary Fig. 2. 
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Extended Data Figure 10 | See next page for caption. 
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Extended Data Figure 10 | Breast cancer cell growth, proliferation 
and self-renewal are not directly affected by treatment with the Alox5 
inhibitor Zil. a, b, Neutrophils were isolated from the lungs of MMTV- 
PyMT mammary tumour-bearing mice treated daily with Zil and used to 
condition culture medium (LuN-Zil) (a). Enzyme-immunoassay analysis 
of LTB4 levels in control, LuN or LuN-Zil medium (n= 4 (control), 

n=4 (LuN), n=3 (LuN-Zil)) (b). ¢, d, f-i, Analysis of CD11b*Ly6GT 
neutrophils in the lung by flow cytometry (c, f, h) and primary tumour 
burden (4d, g, i) at the time of analysis of Rag1-null mice orthotopically 
transplanted and intravenously injected with GFP-labelled 10° primary 
MMTV-PyMT cancer cells (n = 3 (DMSO), n=9 (PyMT DMSO), n=7 
(PyMT Zil)) (c, d), 10° mouse 4T1 cancer cells (n= 4 (DMSO), n=5 
(4T1 DMSO), n=7 (4T1 Zil)) (f, g) or 10° human MDA-MB-231 cancer 
cells (n= 4 (DMSO), n=6 (MDA231 DMSO), n=5 (MDA231 Zil)) 

(h, i), and treated with Zil to complement Fig. 4d-k. e, Determination 

of in vivo cancer cell proliferation 18 h after intravenous injection 

of 10° GFP-labelled MMTV-PyMT cancer cells into MMTV-PyMT 
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tumour-bearing, Zil-treated mice by 6 h BrdU pulse and flow cytometric 
quantification of BrdU* among GFP* cancer cells in the lung (n =3 
(PyMT DMSO), n=4 (PyMT Zil). j, Quantification of mammary 

tumour load of control (DMSO) or Zil-treated wild-type mice 4-6 weeks 
after orthotopic transplantation with 10° MMTV-PyMT cells onto the 
mammary gland. Daily Zil treatment started 1 day prior to mammary 
tumour engraftment (n = 11 (DMSO), n=8 (Zil)). k, Flow cytometric 
quantification of BrdU incorporation after a 3 h pulse of two primary 
MMTV-PyMT cell preparations and one culture of the mouse 4T1 cell line 
treated with 11M Zil for 24 h in adherent conditions. 1, Flow cytometric 
quantification of frequency of CD24*CD90* MICs in total MMTV-PyMT 
cells after 3-day treatment with 1 1M Zil or control DMSO in adherent 
culture (n =3 per group). m, Sphere formation of MMTV-PyMT cancer 
cells in the presence of 1 |1M Zil after 7 days (technical replicate n = 8 per 
group of biological duplicates). Statistical analysis by two-sided t-test 
(b-d, f-m) and one-sided f-test (e). Data are represented as mean + s.e.m. 
NS, not significant, *P < 0.05, ***P< 0.001. 
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Neuroblastoma is a paediatric malignancy that typically arises in 
early childhood, and is derived from the developing sympathetic 
nervous system. Clinical phenotypes range from localized 
tumours with excellent outcomes to widely metastatic disease in 
which long-term survival is approximately 40% despite intensive 
therapy. A previous genome-wide association study identified 
common polymorphisms at the LMO1 gene locus that are highly 
associated with neuroblastoma susceptibility and oncogenic 
addiction to LMO1 in the tumour cells!. Here we investigate the 
causal DNA variant at this locus and the mechanism by which it 
leads to neuroblastoma tumorigenesis. We first imputed all possible 
genotypes across the LMO1 locus and then mapped highly associated 
single nucleotide polymorphism (SNPs) to areas of chromatin 
accessibility, evolutionary conservation and transcription factor 
binding sites. We show that SNP rs2168101 G>T is the most highly 
associated variant (combined P= 7.47 x 10~”°, odds ratio 0.65, 
95% confidence interval 0.60-0.70), and resides in a super-enhancer 
defined by extensive acetylation of histone H3 lysine 27 within the 
first intron of LMO1. The ancestral G allele that is associated with 
tumour formation resides in a conserved GATA transcription 
factor binding motif. We show that the newly evolved protective 
TATA allele is associated with decreased total LMO1 expression 
(P=0.028) in neuroblastoma primary tumours, and ablates GATA3 
binding (P < 0.0001). We demonstrate allelic imbalance favouring 
the G-containing strand in tumours heterozygous for this SNP, as 
demonstrated both by RNA sequencing (P < 0.0001) and reporter 
assays (P= 0.002). These findings indicate that a recently evolved 
polymorphism within a super-enhancer element in the first intron of 
LMO1 influences neuroblastoma susceptibility through differential 
GATA transcription factor binding and direct modulation of LMO1 
expression in cis, and this leads to an oncogenic dependency in 
tumour cells. 

Genome-wide association study (GWAS) efforts frequently identify 
highly statistically significant genetic associations within non-coding 
regulatory regions of the genome, but the underlying causal DNA 
sequence variations have only been identified in a few instances. A 
neuroblastoma GWAS has identified several disease susceptibility 
loci’-’, with the signal within the LIM domain only 1 (LMO1) locus 
at 11p15 (ref. 1), a transcriptional co-regulator containing two zinc 
finger LIM domains that nucleate and regulate transcription factor 


complexes, being most robust. The main members of the LMO gene 
family, LMO1-4, are all implicated in cancer including LMO1 and 
LMO2 translocations in T-cell leukaemia’, and we previously provided 
the first evidence that LMO1 was a bona fide neuroblastoma oncogene’. 
Here, we sought to identify the causal polymorphism(s) driving the 
LMOI1 genetic association with neuroblastoma susceptibility as a basis 
for understanding neuroblastoma initiation and addiction mechanisms. 

We first performed fine mapping of associated germline SNPs and 
indels at the LMO1 gene locus by imputation to the 1000 Genomes 
Project for our European-American neuroblastoma GWAS*. This 
identified 27 SNPs with minor allele frequency (MAF) >0.01 and an 
association P<1 x 10~° (Fig. la and Extended Data Table 1). We fur- 
ther prioritized associated variants by evolutionary conservation, and 
by their regulatory potential inferred through neuroblastoma-specific 
DNase I hypersensitivity mapping and chromatin immunoprecipita- 
tion sequencing (ChIP-seq) from the ENCODE Consortium (Fig. 1b). 
These data showed that the most significantly associated SNP at the 
LMO1 locus (rs2168101, odds ratio = 0.67, P=4.14 x 10~!°) resides 
within a highly conserved and active enhancer region inferred by 
DNase I sensitivity and p300 binding in the SKNSH neuroblastoma cell 
line (Fig. 1b). Notably, we found no rare or common non-synonymous 
coding variants in LMO1 in a combined cohort of 482 unique neuro- 
blastoma cases with germline whole-genome (n = 136), whole-exome 
(n=222) and/or targeted DNA sequencing (n = 183) (see Extended 
Data Table 2 and Supplementary Data). 

Because rs2168101 genotypes were imputed in our analyses 
(Extended Data Fig. 1), we next directly genotyped this SNP in 146 out 
of 2,101 European-American cases, and measured an 86% imputation 
accuracy (Supplementary Table 1). We additionally directly genotyped 
rs2168101 in two independent cohorts from the UK and Italy, with both 
showing robust replication (Table 1). We did not observe replication 
in an independent African-American cohort. Notably, the protective T 
allele is common in Europeans (CEU HapMap: 28%) and East Asians 
(CHB+JPT HapMap: 32%), but is rare or absent in Africans, indicat- 
ing recent expansion of the rs2168101 protective allele in non-African 
human populations. Meta-analysis demonstrated a combined associ- 
ation P=7.47 x 10~?° across 8,553 controls and 3,254 cases (Table 1). 

As causal SNPs driving GWAS associations may disrupt transcrip- 
tion factor binding at distal enhancers, we sought to identify candi- 
date SNPs disrupting known JASPAR motifs®, which revealed that lead 
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Figure 1 | Imputation-based GWAS and epigenomic profiling by 
ENCODE identifies rs2168101 as a candidate functional SNP at LMO1. 
a, Manhattan plot for neuroblastoma GWAS (cases = 2,101; controls = 4,202). 
The neuroblastoma-associated region falls within a 40-kilobase (kb) 
haplotype block (grey box) in Europeans, encompassing the LMO1 
3/-terminus. rs2168101 is the most associated variant and is moderately 
correlated (maximum r = 0.52) with other variants. The sentinel SNP 
reported previously, rs110419, is also highlighted (#). b, Associated variants 
(P<1x 10~°) are plotted with ENCODE tracks for neuroblastoma cell line 
SKNSH. Two SNPs, rs2168101 and rs7948497, were annotated ‘enhancer 
SNPs based on overlapping DNase peaks binding p300. The rs2168101 G>T 
SNP disrupts an evolutionarily conserved GATA transcription factor (TF) 
motif (5’-A[G/T]ATAA-3’). SKNSH has an rs2168101 = G/G genotype that 
preserves GATA binding, supported by ENCODE GATA3 ChIP-seq. 


candidate SNP rs2168101 resides in a highly conserved GATA-binding 
motif (5’-A[G/T] ATAA-3’, mammalian phastCons score = 100%) 
(Fig. 1b). ENCODE transcription factor ChIP-seq confirmed GATA2 
and GATA3 binding at the rs2168101 GATA motif in the neuroblas- 
toma cell lines SKNSH and SHSY5Y, which are G/G homozygous, 
thereby preserving the GATA motif (Fig. 1b). No other associated var- 
iant showed this unique combination of evolutionary conservation, 
active enhancer localization, and disruption of a transcription factor 
binding motif, including the sentinel SNP rs110419 (P= 1.17 x 10-4) 
from our original report!. 

To test for the possibility of multiple statistical signals or enhancers 
not marked by conservation or p300 at the LMO1 locus, we repeated 
association testing conditional on imputed rs2168101 genotypes and 


LETTER 


a b c 
100 3 05 5 — 
2 41.00 
P=0.028 § P<0.0001 % 
3 a & Gt 
p 0.4 
s$ 7 3 io kexe 
x Eo 2@t 0.75 
E o+ ao 
> ZO 03 Te 
= 2= te 
& 50 So 25 
ra s* 2 060 Mae. oe eee, 
Zz (oy ae 8 
5 & = 0.2 5 £ 
2 3° a3 
6 2 ie aA £S 0.25 
= & sch | oo eas 
~ ed = = 
oO 
) 0.0 2 0.00 ey 


rs2168101 = G/G G/T 1643 NGP NLF SY5Y 


Primary tumours Cell lines 


182168101 = G/G G/T 
Primary tumours 


Figure 2 | RNA expression of LMO1 associates with rs2168101 genotype 
consistent with regulation in cis. a, mRNA-seq across 127 primary 
tumours genotyped for rs2168101 (G/G = 102, G/T = 25, T =0) revealed 

a significant decrease in LMO1 gene expression between G/T and G/G 
tumours (f-test P= 0.028). RPKM, reads per kilobase per million reads. 

b, Using the synonymous exonic SNP, rs3750952, to measure allelic 
expression by mRNA-seq revealed significantly more allelic imbalance 

in 12 heterozygous neuroblastoma tumours (rs2168101 = G/T) than 

in 33 homozygous tumours (182168101 = G/G) (t-test P=5.3 x 10~°). 

c, Allelic expression for rs2168101 from targeted nascent RNA-seq 

in four neuroblastoma cell lines. The two heterozygous cell lines 
(rs2168101 = G/T) exhibited significantly reduced T-allele expression 
compared to the G allele (t-test P= 1.6 x 10°“ and 1.5 x 10°? for NGP and 
NLE, respectively; error bars denote 95% confidence intervals across n = 3 
duplicate experiments). 


observed no significant variants after multiple test correction (most 
significant variant: rs34544683, nominal P=9.0 x 10~4, Bonferroni 
P=1; Extended Data Fig. 2a). To test whether the rs2168101 signal 
can be equally captured by other variants, we also performed recip- 
rocal association tests for rs2168101 conditioned on all 27 other SNPs 
within 1.5 megabases (Mb) of LMO1 passing thresholds MAF > 0.01 
and nominal P <1 x 107°. Notably, rs2168101 remained signifi- 
cant across all conditional tests (worst-case nominal P=2.6 x 107’, 
Bonferroni P=0.002; Extended Data Fig. 2b). These results are con- 
sistent with a single underlying signal at the LMO1 locus, and re-affirm 
that rs2168101 is the single best causal SNP candidate, because its asso- 
ciation with neuroblastoma cannot be accounted for by other variants. 

We next sought to determine whether rs2168101 genotypes were 
associated with LMO1 expression by messenger RNA sequencing 
(mRNA-seq) of 127 primary high-risk neuroblastoma tumours. 
Genotyping rs2168101 yielded 102 G/G, 25 G/T and no T/T tumours 
(MAF = 9.8%). We observed significantly higher LMO1 expression 
in G/G versus G/T genotype tumours (t-test P = 0.028; Fig. 2a). 
Notably, the absence of protective homozygous T/T genotypes in 
this high-risk neuroblastoma cohort is consistent with our previous 
observation that the risk alleles predispose to the high-risk pheno- 
typic subset! (for clinical covariate associations, see Extended Data 
Table 3). Accordingly, the rs2168101 G/G genotype is highly associ- 
ated with decreased neuroblastoma patient event-free (P = 0.0004) 
and overall (P = 0.0004) survival compared to G/T and T/T geno- 
types together in our European-American cohort (Extended Data 
Fig. 3). Two cell lines with homozygous T/T or T/— genotypes 
expressed LMO] at comparatively lower levels than cell lines contain- 
ing the G allele (Extended Data Fig. 4a). 

GATA transcription factors mediate chromatin looping and 
facilitate long-range enhancer-promoter interactions to regu- 
late target gene expression!®. We therefore sought to confirm 
allelic imbalance of LMO1 transcripts (a hallmark of gene regu- 
lation in cis), which could result from differential GATA-binding 
caused by rs2168101. First, because the rs2168101 intronic SNP 
is not detectable by mRNA-seq, we identified the LMO1 exonic 
synonymous SNP, rs3750952, which can measure allelic expres- 
sion in the heterozygous state. We identified 45 tumours with 
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the necessary rs3750952 = C/G genotype, and then directly 
genotyped rs2168101 (G/G=33, G/T = 12, T/T =0) in this panel. By 
mRNA-seq, there was greater allelic imbalance in 12 tumours that were 
heterozygous for rs2168101 (G/T) than in 33 homozygous tumours 
(rs2168101 = G/G; t-test P< 0.0001; Fig. 2b). We next used targeted 
sequencing of nuclear-enriched nascent RNAs in four neuroblastoma 
cell lines (G/G = 1, G/T =2, T/T = 1) to provide direct ascertainment 
of allele-specific expression at rs2168101. In both heterozygous lines, 
we observed allelic imbalance that significantly favoured the risk G 
allele over the protective T allele (Fig. 2c). Collectively, these results 
indicate that the intact GATA motif at rs2168101 results in signifi- 
cantly higher LMO1 expression levels than the TATA coded by the 
alternative allele. Allelic imbalance of LMO1 was not driven by somatic 
DNA alterations (for example, loss of heterozygosity) that could affect 
allelic dosage (Extended Data Fig. 4b). 


Table 1 | Replication and meta-analysis of rs2168101 association 


*P< 0.05, **P < 0.001 by t-test. n =3 independent 
transfections, n=9 technical replicates. 


Examination of neuroblastoma transcriptome data for 127 pri- 
mary tumours showed that GATA2 and GATA3 are overexpressed 
compared to other members of the GATA transcription factor family 
(Extended Data Fig. 5a), and that GATA3 is the most highly expressed. 
Additionally, protein immunoblotting showed that GATA3 is uniformly 
highly expressed in neuroblastoma cell lines, while LMO1 is highly 
expressed in the G/G (SKNSH and SHSY5Y), G/— (KELLY) and G/T 
(IMR32) cell lines, but only barely detectable in the BE2C cell line that 
lacks a G allele at the rs2168101 locus (Extended Data Fig. 5b). We 
therefore performed ChIP-seq using a GATA3 antibody in neuro- 
blastoma cell lines, and observed robust GATA3 binding at rs2168101 
in lines containing the G allele (SHSY5Y, KELLY, BE2 and NGP) but 
not in a line containing only a T allele (BE2C; Fig. 3a). We then spe- 
cifically considered GATA3 ChIP-seq reads overlapping rs2168101, 
and we observed strong preferential binding to the G allele in the G/T 


Ref/alt (major/ Het odds ratio Hom odds ratio 
SNP. minor) allele Cohort MAF cases MAF controls Additive P value Additive odds ratio (GT vs GG) (TT vs GG) 
rs2168101 G/T European- 0.242 (n=2,101) 0.313(n=4,202) 4.14x 10716 0.67 (0.61-0.74) 0.69 (0.62-0.77) 0.52 (0.42-0.64) 
American* 
Italian 0.164 (n=420) 0.250 (n=751) 2.07 x 10-® 0.61 (0.50-0.75) 0.57 (0.44-0.74) 0.40 (0.21-0.75) 
UK 0.190 (n=369) 0.311 (n=1,109) 5.86x10-1° 0.56 (0.47-0.68) 0.51 (0.39-0.66) 0.31 (0.18-0.53) 
African- 0.0865 (n=364) 0.0891 (n=2,491) 0.20 0.79 (0.56-1.13) 0.96 (0.71-1.30) 1.07 (0.38-3.04) 
American* 
Combined 7.47 x 10-29 0.65 (0.60-0.70) 0.67 (0.61-0.73) 0.49 (0.41-0.59) 


Alt, alternative; het, heterozygous; hom, homozygous; ref, reference. Odds ratio 95% confidence intervals are shown in parentheses. 
*Imputed genotypes and correction for population stratification. 
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heterozygous cell lines BE2 (0.97 G-allele fraction from 38 reads, 95% 
confidence interval: 0.86-1.00, binomial test P=2.8 x 107!°) and NGP 
(1.00 G-allele fraction from 6 reads, 95% confidence interval: 0.54—1.00, 
binomial test P=0.03; Fig. 3b). 

Acetylation of histone H3 at lysine 27 (H3K27ac) is a hallmark of 
active enhancers’', and ChIP-seq analysis of SHSY5Y (G/G; not MYCN 
amplified), KELLY (G/—; MYCN amplified), BE2 (G/T; MYCN ampli- 
fied) and NGP (G/T; MYCN amplified) neuroblastoma cells showed 
extensive H3K27 acetylation in the first intron of LMO1 across 
rs2168101, which was not observed in BE2C (T/—; MYCN ampli- 
fied; Fig. 3c). This region is classified as a super-enhancer in G-allele- 
containing lines SHSY5Y, KELLY and BE2 based on enhancer clustering 
and especially high H3K27ac signal (NGP was just below the threshold; 
see Methods), a pattern also observed for other known oncogenes and 
tumour suppressor genes in this disease’? (Fig. 3d and Extended Data 
Fig. 6a). No super-enhancer was observed in BE2C or Jurkat T-ALL 
cells that also express LMO1 (ref. 13), or in other non-neuroblastoma 
tissues from ENCODE (Fig. 3d and Extended Data Fig. 6b, c). These 
results are consistent with recent evidence that disease-associated SNPs 
frequently affect enhancers that are specific to disease-relevant cell lines 
and tumour histology, and control developmental stage and tissue- 
specific gene expression!” 4-18, 

We next performed luciferase reporter assays to measure the effect 
of rs2168101 alleles on enhancer activity. HEK293T cells trans- 
fected with constructs containing the risk G allele demonstrated 
30-300-fold higher normalized luminescence compared to the T allele 
(t-test P=0.002, Fig. 3e), whereas luciferase activity of the T allele 
was not significantly different from empty vector, indicating that the 
intact GATA motif is required for robust enhancer activity. Finally, 
knockdown of GATA3 in SHSY5Y and KELLY cells resulted in both 
decreased LMO1 protein levels and suppression of cell growth that was 
rescued by LMO1 overexpression (Fig. 3f and Extended Data Fig. 7), 
indicating the central role of GATA3 in regulating LMO1 expression 
levels in neuroblastoma. 

Taken together, these data demonstrate the underlying molecular 
mechanism for a highly robust genetic association to neuroblastoma, 
mediated by a single common causal SNP rs2168101 that disrupts 
a GATA transcription factor binding site within a tissue-specific 
super-enhancer element. The rarity or absence of the protective allele 
in African populations and its relative depletion in African-Americans 
may partially explain the more aggressive clinical course in African- 
American children!’. Moreover, this work further confirms the utility 
of association studies to define clinically relevant oncogenic pathways. 
Finally, the dependence of neuroblastoma cells on super-enhancer- 
mediated LMO1 expression provides another potential mechanism 
for the sensitivity of these tumours to inhibitors of the transcriptional 
machinery such as CDK7 and BET bromodomain proteins!*!°, 
demonstrating the potential of translating basic mechanistic insights 
of tumour initiation towards novel therapeutic strategies. 


Online Content Methods, along with any additional Extended Data display items and 
Source Data, are available in the online version of the paper; references unique to 
these sections appear only in the online paper. 
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METHODS 


No statistical methods were used to predetermine sample size. 

Genotype imputation and association testing. A primary European-American 
cohort of 2,101 cases and 4,202 matched controls were assayed with Illumina 
HumanHap550 v1, lumina HumanHap550 v3, and Illumina Human610 SNP 
arrays as previously described®. Genotypes were phased using SHAPEIT v2.r790 
and data from 1000 Genomes phase 1 version 3. Subsequently, imputation was 
performed using IMPUTE2 v2.3.1 for all SNPs and indel variants annotated in 
the 1000 Genomes phase 1 version 3. Testing for association with neuroblastoma 
under an additive genetic effect model was performed using the frequentist like- 
lihood score method implemented in SNPTEST v2.4.1. Genotypes for a previ- 
ously described African-American cohort of 365 cases and 2,491 controls”? were 
imputed and tested for neuroblastoma association using the same analytic pipeline. 
Statistical adjustment for gender was performed in both cohorts. For population 
stratification adjustment, the first 20 multidimensional scaling (MDS) components 
were included as covariates in the European-American cohort, while a measure of 
African admixture as estimated by the ADMIXTURE software program was used 
in the African-American cohort. Manhattan plots of SNP position and statisti- 
cal significance were generated using LocusZoom software. Linkage plots were 
generated by Haploview software based on HapMap CEU individuals (version 3, 
release 2) using default settings. All research subjects or their guardians provided 
informed consent for research, and all institutions involved in this research had 
regulatory approval for human subjects research. 

Prioritization of candidate causal variants. All SNPs and indels reported in 
the 1000 Genomes phase 1 version 3 data were considered as candidate causal 
variants and were ranked based on a combination of (1) neuroblastoma associ- 
ation in the primary European-American cohort, (2) evolutionary conservation, 
(3) DNase I hypersensitivity, and (4) transcription factor binding motif matching. 
Neuroblastoma association in European-Americans was evaluated as described 
above. Conservation scores were computed as the average of the phastCons46way- 
Placental UCSC conservation track score for all bases from the —10 position to 
the +10 position surrounding each candidate variant. A DNase I hypersensitivity 
score was calculated by counting the number of sequencing tags from the —100 
position to the +100 position around each candidate variant in ENCODE data 
for the neuroblastoma cell line, SK-N-SH. Position weight matrices representing 
transcription factor binding motifs were obtained from the JASPAR database, and 
candidate binding sites were identified by scanning the hg19 human reference 
genome using the MATCH-TM algorithm with a matrix similarity score (mSS) 
threshold of 0.90. 

Neuroblastoma association replication and meta-analysis for rs2168101. We 
replicated the association of rs2168101 with neuroblastoma by direct genotyping 
of rs2168101 in independent Italian (cases = 420, controls = 751) and UK cohorts 
(cases = 369, controls = 1,109). Meta-analysis across the European-American, 
African-American, Italian and UK cohorts was performed using the inverse var- 
iance method provided in the METAL software program. Beta values (log-odds) 
and standard errors generated by SNPTEST, as described above, were used as input. 
Survival analysis. We compared both overall survival and event-free survival over 
a 10-year follow-up period between G/G versus G/T and T/T rs2168101 genotypes 
in a case-case comparison between neuroblastoma patients from the European- 
American cohort. Because rs2168101 genotypes were imputed in this cohort, the 
most-probable genotype predicted by IMPUTE2 was used for each patient. In 
the event of insufficient follow-up, all data was right censored. Cox proportional 
hazard modelling was performed using 20 MDS components to account for 
population stratification, in addition to MYCN amplification status, as covariates. 
All statistical analysis and generation of Kaplan-Meier plots was performed in R 
using the CRAN repository package, ‘survival. 

Total and allele-specific expression analysis. Total and allele-specific RNA expres- 
sion analysis was performed based on poly-A-enriched RNA-sequencing data from 
127 primary neuroblastoma tumours sequenced through the TARGET project. 
RNA-seq reads were aligned to the hg19 human reference genome using the 
STAR aligner (v2.4.0b). Aligned reads were assigned to RefSeq genes using HTSeq 
(v0.6.1) and normalized to RPKM for total gene expression measurements. DNA 
genotypes for rs2168101 were obtained either through matched whole-genome 
sequencing (n= 69) or targeted genotyping assays (n = 58 additional tumours). 
DNA genotypes for rs3750952 were obtained through either matched whole- 
genome or whole-exome sequencing. 

Allele-specific RNA expression analysis was performed from a subset of 45 pri- 
mary neuroblastoma tumours (out of 127) with the necessary synonymous exonic 
SNP genotypes (1s3750952 = C/G) to enable measurement of allelic expression 
by mRNA-seq. As a readout for allelic imbalance of rs3750952, we computed 
allelic fractions as min(C, G)/(C + G), since phasing between rs3750952 and 
rs2168101 alleles in each tumour was unknown. Statistical comparison between 
the two groups was performed by two-sided Welch's t-test, comparing 12 tumours 


heterozygous for rs2168101 (G/T) to the remaining 33 tumours that were homo- 
zygous for rs2168101 (G/G) as controls. DNA genotyping for rs2168101 was per- 
formed by whole-genome sequencing or a directed genotyping assay, whereas DNA 
genotyping for rs3750952 was determined from TARGET whole-exome or whole- 
genome sequencing. Where possible, integrity of sample matching was verified by 
measurement of genome-wide genotype concordance. All genotypes are reported 
with respect to the minus strand of the human reference genome, hg19. 

To measure allele-specific expression directly at the intronic SNP we first 
purified the nuclear RNA fraction using the Cytoplasmic and Nuclear RNA 
purification Kit (Norgen Biotek, 21000) from four neuroblastoma cell lines 
(SNP rs2168101: SHSY5Y = G/G; NLF = G/T; NGP = G/G; NB1643 = T/T). 
Ion AmpliSeq Designer v3.4.3 (Life Technologies White Glove service) was used 
to design amplicons targeting the intronic SNP rs2168101 and three additional 
exonic SNPs in linkage disequilibrium. Custom AmpliSeq libraries were prepared 
in triplicate for each cell line, indexed, pooled and sequenced using an Ion 318 
Chip on a Personal Genome Machine (Life Technologies). Reads were aligned to 
the hg19 reference genome and a synthetic genome showing the alternate allele at 
SNP rs2168101 at hg19 chr11:8255408 to account for any alignment bias. High- 
quality mapped reads containing the reference G allele or alternative T allele were 
counted and tested for significant deviation from 50:50 expression using a two- 
sided one-sample t-test (null hypothesis that allele fraction = 0.50) across three 
experimental replicates. Primer pair sequences: rs1042359 forward: 5’-GTGTGG 
GAGACAAAUTCTTCCUGA-3’, reverse: 5’-GCCGGGCGUTACTGAACUT-3'; 
1s3750952 forward: 5'‘-CGCAAGAUCAAGGACCGCTAUC-3/, reverse: 
5'-GATGAGGTUGGCCTTGGTGUA-3’; rs2168101 forward: 5’-CCUT 
TCCUGAAGGAGCGCAAA-3’/, reverse: 5‘-CACTTTCCATUAAGGAGAT 
AGCAUCCC-3’; rs204929 forward: 5’/-CAAUCTAGGTUAAGAGCCGGACAA 
G-3/, reverse: 5/-GTGUCCAGCCGCAGCUA-3’. 

Reporter assays. Primers were designed to clone a 553-bp genomic region (hg19, 
chr11:8255155-8255707) surrounding the candidate SNP rs2168101 at the GATA 
transcription factor binding site from neuroblastoma cell lines SENSH (G/G) and 
matching site of BE2C (T/—). The cloned region did not contain other statisti- 
cally significant SNPs at the LMO1 locus. The primers were designed to introduce 
sequences for restriction sites 5’-Xhol and 3’-BgllII, which are present in the MCS 
of pGL4.26[luc2/minP/Hygro] (Promega, E8441). XholI/Bglll restriction enzyme 
digested fragments were sequence verified, gel purified, ligated into pGL4.26[luc2/ 
minP/Hygro], transformed into One Shot TOP 10 chemically competent cells (Life 
Technologies, C4040-10) and grown on LB plates containing 50 ug ml! ampi- 
cillin overnight at 37°C. Colonies positive for the vector containing the insert 
were grown in 50 ml LB broth containing 50g ml! ampicillin and plasmids 
were purified using a Qiagen Plasmid Midi Prep Kit (Qiagen, 12143). Transfection 
into HEK293 cells which were approximately 50% confluent was accomplished 
using Fugene 6 Transfection reagent (Promega E2691) at a 3 ul:1 ug fugene: DNA 
ratio. Cells underwent selection in 150 ug ml! Hygromycin B (Mediatech, 
30-240-CR) and individual colonies were picked and grown, and genotypes of 
constructs were confirmed by fragment size and Sanger sequencing. Subsequently, 
HEK293 + 553 bp insert cells and HEK294 + vector only cells were grown in 
96-well optical plates. On day 2, the cells were transiently fugene transfected with the 
Renilla expression control vector pGL4.74[hRLuc/TK] (Promega, E6921) at a 1:500 
dilution with respect to the luciferase vector. Luciferase assays were carried out 48h 
after Renilla transfection using Dual Luciferase Reporter Assay System (Promega, 
E1910) with read-outs performed on a Dual Injector System for GloMax-Multi 
Detection System (Promega, E7081). Luciferase expression was normalized to 
Renilla expression. All reporter assays were performed in quintuplicate (five 
technical replicates each) across the experimental conditions: (1) HEK293T, 
(2) HEK293T with empty vector, (3)-(6) four independent clones of HEK293T 
with T allele construct, and (7)-(10) four independent clones of HEK293T with 
G allele construct. Results were averaged across technical replicates, normalized 
to empty vector, and reporter activities for T allele versus G allele clones (four 
biological replicates each) were analysed by two-sided Welch’s t-test. 

Construct risk allele (G): 
GTAGGGGTTGGAGTTCAGCCTGTTTCCCCTCCAATGTTGTTCCCCC 
CACATCCTGAGACTTAGGGGTGACCCTGGGTTGAGTGGACTGGTTTA 
TTCTGCTGGGCCCAGCGCATGCATCTGAGTGTGTGCCCAGGCGTGCG 
TGTCGGCGCAAACATCATCCATTGTGAAATATCAGTGTTTTCATGGGT 
GAGTAGTAATTACTGGGTAATGCTTTAAAACCTTTCCTGAAGGAGCGC 
AAAGCCATTTTTTTCTAAAGTCAGGAGTACATTAAAAGGATTACCATG 
TAGATTTGATTTTTAGATAACACTAAAATGGATCCCAAATGGACTTCA 
GCAAAGGGATGCTATCTCCTTAATGGAAAGTGCATGGCCCGAGGCTC 
AGGTCCCAGAGCCAGGCTGGGGAAGGAGGGAGGGAAGAGGTGTCTG 
CAGGGGGGCAGGCTGGCAGATTGGGTGGGGGCTAGGTGGGAATGGG 
GAAGGCAGAGCAGGAGGGAGGGCCTGGACCCTGTGGGGAGCTTATC 
CCTCCATCTGGGGAGCAGGAGACTACAGAGCCCCT. 
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Construct protective allele (T): 
GTAGGGGTTGGAGTTCAGCCTGTTTCCCCTCCAATGTTGTTCCCCC 
CACATCCTGAGACTTAGGGGTGACCCTGGGTTGAGTGGACTGGTTTA 
TTCTGCTGGGCCCAGCGCATGCATCTGAGTGTGTGCCCAGGCGTGCG 
TGTCGGCGCAAACATCATCCATTGTGAAATATCAGTGTTTTCATGGGT 
GAGTAGTAATTACTGGGTAATGCTTTAAAACCTTTCCTGAAGGAGCGC 
AAAGCCATTTTTTTCTAAAGTCAGGAGTACATTAAAAGGATTACCATG 
TAGATTTGATTTTTATATAACACTAAAATGGATCCCAAATGGACTTCAG 
CAAAGGGATGCTATCTCCTTAATGGAAAGTGCATGGCCCGAGGCTCA 
GGTCCCAGAGCCAGGCTGGGGAAGGAGGGAGGGAAGAGGTGTCTGC 
AGGGGGGCAGGCTGGCAGATTGGGTGGGGGGCTAGGTGGGAATGGG 
GAAGGCAGAGCAGGAGGGAGGGCCTGGACCCTGTGGGGAGCTTATC 
CCTCCATCTGGGGAGCAGGAGACTACAGAGCCCCT. 

Cell culture and protein lysates. Jurkat T-ALL and neuroblastoma cell lines 
were sourced from the American Type Tissue Culture Collection, and kept 
in growth medium of RPMI+10% heat-inactivated FCS with 1% penicillin- 
streptomycin, as previously described". Cells were lysed for protein, with subse- 
quent protein quantified by spectrophotometry, as previously described”!. Protein 
was resolved on 8-14% Tris-Bis gels, transferred to PVDF membranes, blocked and 
subjected to primary and secondary antibodies, as previously described”). Primary 
antibodies were anti-GATA3 (Pierce Biotechnology, 1:1,000), anti-LMO1 (Bethyl 
Laboratories, 1:1,000) and a-tubulin (Cell Signaling Technologies, 1:1,000). Blots 
were developed with secondary horseradish peroxidase (HRP)-conjugated anti- 
bodies (Cell Signaling Technologies, 1:5,000) and Protein-plus Dura ECL Reagent 
(Thermo-Fisher Scientific). All cell lines are genotyped semiannually to assure 
identity and also tested routinely for mycoplasma contamination. 

Genome-wide occupancy analysis. ChIP coupled with massively parallel DNA 
sequencing (ChIP-seq) was performed as previously described’, The following 
antibodies were used for ChIP: anti- H3K27ac (Abcam, ab4729) and anti-GATA3 
(Santa Cruz, sc-22206X). For each ChIP, 10 ug of antibody was added to 3 ml of 
sonicated nuclear extract. Illumina sequencing, library construction and ChIP-seq 
analysis methods were previously described”’. 

ChIP-seq processing. Reads were aligned to build hg19 of the human genome 
using bowtie with parameters -k 2 -m 2 -e 70 -best and -I set to the read length™. 
For visualization in the UCSC genome browser in Fig. 3a, c and Extended Data 
Fig. 6 (ref. 25), WIG files were created from aligned ChIP-seq read positions using 
MACS 1.4.2 with parameters -w -S -space = 50 -nomodel -shiftsize = 200 to artifi- 
cially extend reads to be 200 bp and to calculate their density in 50-bp bins”°. Read 
counts in 50-bp bins were then normalized to the millions of mapped reads, giving 
reads per million values. 

ChIP-seq allele specificity analysis. To determine preferential ChIP-seq coverage 
of one allele, which implies preferential binding of protein to one allele versus 
another, we counted the reads at rs2168101 using samtools mpileup”’. By using 
the aligned reads described above, this gave us a count of reads with a given base at 
this position. The fraction of reads with the risk allele versus the protective allele is 
reported in Fig. 3b. Statistical tests for preferential allelic binding were performed 
by two-sided binomial test. 

Enriched regions. Regions enriched in ChIP-Seq signal were identified twice using 
MACS with corresponding control and parameters -keep-dup = all and -p le-9 
or -keep-dup = 1 and -p le-9. Super-enhancers in SHSY5Y and KELLY were iden- 
tified using ROSE (https://bitbucket.org/young_computation/rose)'** with mod- 
ifications based on ref. 14. In brief, peaks of H3K27ac were identified using MACS 
as described above and their union was used as constituent enhancers. These peaks 
were stitched computationally if they were within 12,500 bp of each other, although 
peaks fully contained within +2,000 bp from a RefSeq promoter were excluded 
from stitching. These stitched enhancers were ranked by their H3K27ac signal 
(length x density) with input signal in the corresponding region subtracted. Super- 
enhancers were separated from typical enhancers by geometrically determining 
the point at which the line y=x is tangent to the curve of stitched enhancer rank 
versus stitched enhancer signal. Those stitched enhancers above this point are 
considered super-enhancers. 

To account for the known focal amplification of the MYCN locus in KELLY, 
BE2, BE2C and NGP neuroblastoma cells, which contain enhancers, we modified 
our pipeline slightly. Because MACS is insensitive for the identification of peaks 
in focally amplified DNA, we identified peaks of H3K27ac versus input using 
MACS2 callpeak (https://pypi.python.org/pypi/MACS2) with parameters -broad 
-keep-dup = 1 -p le-9 and -broad -keep-dup = all -p le-9. The union of these 
MACS2 calls was used as constituent enhancers for ROSE with the remaining 
parameters as described above. For Fig. 3d, most of the curve represents the 
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analysis performed using MACS-identified constituents; the rank and signal of the 
MYCN- associated enhancer comes from this MACS2-identified set of constituents 
to remain consistent with the conclusions and methods as previously described4. 
The curve output from the MACS-identified enhancers was vertically compressed 
and a point representing the signal of the MYCN-associated super-enhancer from 
the MACS2-identified enhancers was added in Illustrator. Super-enhancers were 
assigned to the single expressed RefSeq transcript whose transcription start site 
was nearest the centre of the stitched region. Expressed genes were in the top 2/3 
of RefSeq transcripts ranked by their promoter (transcription start site +500 bp) 
H3K27ac signal determined by bamToGFF (https://github.com/BradnerLab/ 
pipeline) with parameters -e 200 -m 1 -r -d. 

Clone cell generation. LMO1 cDNA was amplified from pcDNA3-LMO1 and 
subcloned into the XhoI and NotI site of the lentiviral vector pOZ-FHN. Lentivirus 
expressing FH-LMO1 was propagated in HEK293T cells by cotransfection with 
psPAX2 and pMD2.G plasmids (adgene) using FUGENE 6 (Roche) by stand- 
ard methodologies”. Viral supernatant was recovered and KELLY cells were 
infected with lentivirus expressing FH-LMO1 or empty vector alone, as previously 
described’. Cells were sorted for expression of the IL2R, and positive expression 
was used to establish single cell clones. Expression of FH-LMO1 was assessed by 
western blotting as above to confirm overexpression. 

siRNA and growth assays. SHSY5Y, KELLY and KELLY clone cells were reverse 
transfected with 100 nM concentrations of either non-targeted (control siRNA-1) 
or GATA3-targeted siRNA-1 or -2 (Ambion) for 6h with lipofectamine 2000 
(1:1,000) in Optimem I before being replated into growth assays in normal RPMI 
growth media. Cells (2 x 10°) were replated in triplicate for counting at 24, 48 and 
72h post-transfection by manual hemocytometry. Cells (5 x 10°) were replated for 
protein lysates at the same time points. All experiments were repeated in triplicate, 
with a technical replicate number of 9 for all cell growth assays as described”? 
Statistical tests were performed by two-sided Welch's t-test. 

Data access. GWAS and sequencing data used for this analysis are available in 
dbGaP under accession phs000124 and phs000467. The tumour genomics data 
are also available through the Therapeutically Applicable Research to Generate 
Effective Treatments (TARGET) data matrix portal (http://target.nci-nih.gov/ 
dataMatrix/TARGET_DataMatrix.html). Data generated through the ENCODE 
project including DNase I hypersensitivity sequencing and ChIP-sequencing 
data were obtained from ftp://hgdownload.cse.ucsc.edu/goldenPath/hg19/ 
encodeDCC/. Aligned sequencing read (bam) files were used as provided from the 
FTP site. The mammalian evolutionary conservation track representing 46 mam- 
malian species (phastCons46wayPlacental) was obtained from the UCSC Table 
Browser http://genome.ucsc.edu/cgi-bin/hgTables?command=start. JASPAR- 
annotated transcription factor binding site position frequency matrices were 
obtained from http://jaspar.genereg.net/html/DOWNLOAD/JASPAR_CORE/ 
pfm/nonredundant/pfm_all.txt. New ChIP-seq data sets generated in this study 
are available under super series GSE65664. 
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Extended Data Figure 1 | The imputed SNP, rs2168101, is associated 
with neuroblastoma, and the risk ‘G allele is enriched in neuroblastoma 
cases. Ternary density plots of genotype probability vectors [P(G/G), 
P(G/T), P(T/T)] output from IMPUTE2 for rs2168101 in the European- 
American cohort. Vertices represent ‘perfect’ confidence calls in which 
P(genotype) = 1; dotted lines represent decision boundaries for genotype 
calling based on most probable genotype. All plots were normalized by 


GT Enriched in Cases 


GG TT 

[0,0,1] [1,0,0] [0,0,1] 
the total number of individuals studied and subjected to 2D Gaussian 
kernel smoothing. Left, 2,101 cases (red); centre, 4,202 controls (blue); 
right, difference between cases and controls highlights enrichment of 
G/G genotype (homozygous risk) in cases and of G/T and T/T genotypes 
in controls. Validation efforts using PCR-based genotyping in 146 out of 
2,101 European-American cases confirmed an 86% concordance with 
imputation based on most probable genotypes (Supplementary Table 1). 
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Extended Data Figure 2 | Conditional analysis reveals a single non-rs2168101 conditional tests is shown, in order to illustrate the extent 
neuroblastoma association signal at the LMO1 locus and that rs2168101 to which the signal at rs2168101 can be accounted for by other variants 
is the most associated variant. a, Imputation-based neuroblastoma (a similar maximum P value statistic is plotted for other variants). Notably, 
association study conditional on rs2168101. No variants remain rs2168101 remained significant (worst-case nominal P=2.6 x 1077, 
significant after conditioning on rs2168101 (most significant variant: Bonferroni P = 0.002) across all tests. These results are consistent with a 
1834544683, nominal P= 9.0 x 10-4, Bonferroni P= 1). b, Reciprocal single underlying signal at the LMO1 locus, and re-affirm that rs2168101 
analysis conditioned on each of 27 SNPs with a nominal P<1 x 10~°. is the single best causal SNP candidate because its association with 
For rs2168101, the maximum (least significant) P value across all neuroblastoma cannot be accounted for by other single variants. 
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Extended Data Figure 3 | The risk G allele of rs2168101 is associated 
with decreased event-free and overall survival in the European- 
American discovery cohort. Because genotypes for rs2168101 are 
imputed within the European- American discovery cohort, the most likely 
genotype for each neuroblastoma case was called based on the maximum 
of P(G/G), P(G/T) and P(T/T) from IMPUTE2. P values reflect Cox 
proportional hazards regressions adjusted for MYCN amplification status 
and the first 20 MDS components to adjust for population stratification. 
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a, Kaplan-Meier plot for event-free survival. Neuroblastoma cases with 
rs2168101 = G/G versus rs2168101 = G/T or T/T showed significantly 
worse event-free survival (P = 0.0004). b, Kaplan-Meier plot for 

overall survival. Neuroblastoma cases with rs2168101 = G/G versus 
rs2168101 =G/T or T/T showed significantly worse overall survival 
(P=0.0004). Censored data points are shown as black crosses. Number of 
at risk patients at every time point for both event-free survival and overall 
survival are plotted below each respective Kaplan-Meier plot. 
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LMO1 mRNA expression for 24 neuroblastoma 
cell lines measured by microarray 


Normalised LMO1 expression 
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Extended Data Figure 4 | rs2168101 genotype is associated with total 
and allele-specific LMO1 expression in neuroblastoma cell lines and 
primary tumours, and allele-specific expression differences are not 
driven by somatic DNA copy number alterations. a, Neuroblastoma 
cell line LMO1 mRNA expression as quantified by Affymetrix U95Av2 
oligonucleotide arrays and normalized as described"! was significantly 
higher in cell lines harbouring homozygous risk alleles (G/G) compared 
to heterozygous alleles (G/T) (P = 0.047, Mann-Whitney two-tailed). 

b, Allele-specific expression measured by RNA-seq from primary 
neuroblastoma tumours. Since rs2168101 is an intronic SNP that is spliced 
out in mRNA, the synonymous exonic SNP rs3750952 was used as a 
surrogate for measuring allele-specific expression in 39 primary tumours 
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Allele Imbalance 
39 Primary Tumors Heterozygous for rs3750952 


@ rs2168101=G/G 
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which are heterozygous for rs3750952 (C/G genotype). The DNA allelic 
fraction for rs3750952 determined by whole-exome sequencing is plotted 
on the x axis, whereas the RNA allele fraction for rs3750952 determined 
by mRNA-seq is plotted on the y axis. The solid line indicates where DNA 
and RNA allele fractions are equal and dotted lines indicate the boundary 
where DNA and RNA allele fractions are within 10% of each other. 
Tumours that are heterozygous for rs2168101 (G/T genotype, red dots) 
exhibit greater RNA allelic imbalance (P=5.3 x 10°) than homozygous 
controls (rs2168101 = G/G genotype, black dots). By contrast, DNA allelic 
imbalance is no different between G/T versus G/G tumours (P= 0.79), 
indicating that a cis-acting regulatory mechanism, rather than somatic 
DNA alterations, drives LMO1 allelic expression differences. 
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Extended Data Figure 5 | Expression of LMO1 and GATA-family 
transcription factors in neuroblastoma primary tumours and cell lines. 
a, RPKM expression measurements from mRNA-seq are summarized 

via boxplots for 127 primary neuroblastoma tumours for paralogues 


GATA1 through GATA6. Both GATA2 (median RPKM: 56) and GATA3 
(median RPKM: 110) are more highly expressed by 1-4 orders of 


Tubulin 


LMOI1 


GATA3 


rs2168101 
genotype 


magnitude on average compared to other members of the GATA family 

in neuroblastoma. b, Neuroblastoma cell lines were lysed for protein and 
resolved by SDS-PAGE as previously described”’. Jurkat T-ALL cells are 
shown as a positive control for LMO1 and GATA3 expression. Data are 
representative of at least three independent blots. The rs2168101 genotype 
is shown below individual cell lines. 
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Extended Data Figure 6 | The LMO1 super-enhancer is observed in 
neuroblastoma cell lines containing the G allele of rs2168101 and is 
highly tissue-specific. a, H3K27ac signal across all enhancers in SHSY5Y 
(MYCN not amplified; rs2168101 = G/G), BE2 (MYCN-amplified; 
rs2168101 = G/T) and NGP (MYCN-amplified; rs2168101 = G/T) is 
shown. Enhancers are ranked by their signal of H3K27ac minus input 
signal and are geometrically divided into two populations (see Methods). 
Super-enhancers are those at the high end of the population and are 
associated with key genes in neuroblastoma, highlighted on the curve. 


t 


rs2168101 


LMO1-associated super-enhancers were identified in BE2, KELLY and 
SHSY5Y cells, which all contain the G allele of rs2168101, but not in BE2C 
cells in which the G allele is absent. b, H3K27ac ChIP-seq in the Jurkat cell 
line. c, All ENCODE non-neuroblastoma cell lines with H3K27ac ChIP- 
seq profiling. All non-neuroblastoma cell lines considered showed little to 
no evidence for an active enhancer element within the first intron of the 
LMO1 gene locus, consistent with a tissue and disease-specific enhancer 
overlying the neuroblastoma causal SNP rs2168101. 
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Extended Data Figure 7 | Depletion of GATA3 results in suppression 
of cell growth that is rescued by forced LMO1 expression in 
neuroblastoma. Neuroblastoma cells SHSY5Y, KELLY, KELLY 
overexpressing control vector (EV) and KELLY with forced LMO1 
overexpression (LMO1-1 and LMO1-2) were treated with non-targeted 
(siControl) or GATA3-targeting (siGATA3-1, siGATA3-2) siRNAs 

and cells were counted at 24, 48 and 72 h after transfection. Rescue of 
suppressed cell growth after GATA3 depletion by forced LMO1 expression 
in LMO1-1 and LMO1-2 after 72 h is shown on the bottom. Growth 
curves over the time of 72h are shown (to accompany Fig. 3f). Error bars 
denote +s.e.m., 1 = 9 technical replicates. 
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Extended Data Table 1 | Germline variants from 1000 Genomes Project with P<1 x 10~° association with neuroblastoma susceptibility 


from imputation-based SNPTEST analysis of European-American cohort 


Variant ID Position 
(rsID) Chromosome (hg19) 
rs1918/7155. 464 
1s11041809 11 8231605 
rs11041811 11 8231665 
rs11041812 11 8231684 
rs11041813 11 8235207 
rs10839999 11 8236083 
1s10769885 1 8236262 
1s4758049 1 8238428 
1s4758050 11 8238545 
rs4758051 1 8238639 
rs10840000 11 8240113 
1s7933766 1 8240464 
rs11041816 1 8243798 
rs4315061 11 8247020 
1s72474792 11 8247885 
1s12797723 11 8247984 
rs2290451 11 8248440 
17952320 11 8250143 
1s4758317 11 8250811 
rs11041820 11 8251438 
1s3750952 11 8251921 
rs110419 1" 8252853 
rs110420 11 8253049 
rs204928 11 8254433 
1s204926 11 8255106 
rs2168101 11 8255408 
1s7948497 11 8255855 


Alleles | 
(Ref/Alt) 


AIG 
C/T 
C/T 
T/C 
G/A 
C/A 
AIC 
G/C 
G/A 
G/C 
G/A 
AIG 
T/C 
TATAAAA/T 
C/T 
C/G 
G/C 
C/A 
G/A 
G/C 
AIG 
T/C 
AIG 
G/A 
C/A 
C/G 


Alt Allele 
Frequency 
Cases 
.035 (n=2101 
0.498 (n=2101) 
0.492 (n=2101) 
0.492 (n=2101) 
0.478 (n=2101) 
0.480 (n=2101) 
0.513 (n=2101) 
0.511 (n=2101) 
0.511 (n=2101) 
0.510 (n=2101) 
0.509 (n=2101) 
0.511 (n=2101) 
0.397 (n=2101) 
0.425 (n=2101) 
0.524 (n=2101) 
0.443 (n=2101) 
0.295 (n=2101) 
0.408 (n=2101) 
0.514 (n=2101) 
0.294 (n=2101) 
0.408 (n=2101) 
0.441 (n=2101) 
0.441 (n=2101) 
0.444 (n=2101) 
0.440 (n=2101) 
0.242 (n=2101) 
0.479 (n=2101) 


«Forward strand hg19, imputed genotypes from IMPUTE2, frequencies as reported by SNPTEST. 


TSNPTEST, frequentist score test with additive model, adjusted for gender and top 20 MDS components. 


Alt Allele 
Frequency 
Controls 

.054 (n=4: 
0.440 (n=4202) 
0.434 (n=4202) 
0.433 (n=4202) 
0.420 (n=4202) 
0.423 (n=4202) 
0.453 (n=4202) 
0.452 (n=4202) 
0.452 (n=4202) 
0.452 (n=4202) 
0.450 (n=4202) 
0.453 (n=4202) 
0.456 (n=4202) 
0.490 (n=4202) 
0.456 (n=4202) 
0.514 (n=4202) 
0.255 (n=4202) 
0.480 (n=4202) 
0.447 (n=4202) 
0.253 (n=4202) 
0.481 (n=4202) 
0.511 (n=4202) 
0.511 (n=4202) 
0.512 (n=4202) 
0.510 (n=4202) 
0.313 (n=4202) 
0.419 (n=4202) 


P-Value’ 


1.67E-07 
5.06E-07 
3.77E-08 
7.48E-08 
7.34E-08 
1.22E-07 
6.22E-08 
2.09E-07 
8.99E-10 
1.25E-09 
2.04E-10 
2.05E-10 
8.20E-06 
3.03E-11 
5.76E-10 
6.77E-06 
1.89E-11 
3.16E-10 
3.36E-10 
9.85E-10 
1.97E-11 
4.14E-16 
4.05E-10 


© 2015 Macmillan Publishers Limited. All rights reserved 


Odds Ratio‘ 


64 (0.53-0.7 
0.80 (0.74-0.87) 
0.80 (0.74-0.87) 
0.80 (0.74-0.87) 
0.81 (0.75-0.87) 
0.81 (0.75-0.88) 
0.80 (0.74-0.87) 
0.81 (0.74-0.87) 
0.81 (0.74-0.87) 
0.81 (0.75-0.87) 
0.80 (0.74-0.87) 
0.81 (0.75-0.88) 
0.77 (0.71-0.84) 
0.78 (0.72-0.84) 
0.77 (0.71-0.84) 
0.77 (0.71-0.84) 
1.23 (1.12-1.34) 
1.31 (1.21-1.42) 
0.78 (0.72-0.84) 
1.23 (1.12-1.34) 
0.76 (0.70-0.83) 
0.78 (0.72-0.84) 
0.78 (0.72-0.84) 
0.78 (0.72-0.85) 
0.76 (0.70-0.82) 
0.67 (0.61-0.74) 
1.30 (1.20-1.41) 
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Extended Data Table 2 | Clinical characteristics for patients in referenced sequencing data sets 


Whole Exome Whole Genome LMO1 Targeted Whole Transcriptome 
Characteristic Sequencing Sequencing Sequencing mRNA Sequencing 
(Blood and Tumor) (Blood and Tumor) (Blood) (Tumor) 
N=222" N=136 N= 183 N= 127 
Age 
<18mos 0 (0%) 32 (24%) 82 (45%) 8 (6%) 
>= 18 mos 219 (100%) 103 (76%) 101 (55%) 119 (94%) 
Not Available 3 1 0 0 
INSS Stage’ 
Stage 1 0 (0%) 0 (0%) 39 (21%) 0 (0%) 
Stage 2A 0 (0%) 0 (0%) 13 (7%) 0 (0%) 
Stage 2B 0 (0%) 1 (1%) 18 (10%) 0 (0%) 
Stage 3 0 (0%) 6 (4%) 27 (15%) 6 (5%) 
Stage 4 219 (100%) 105 (78%) 78 (43%) 121 (95%) 
Stage 4S 0 (0%) 23 (17%) 8 (4%) 0 (0%) 
Not Available 3 1 0 0 
MYCN 
Not Amplified 143 (67%) 102 (76%) 151 (83%) 95 (75%) 
Amplified 71 (33%) 32 (24%) 30 (17%) 31 (25%) 
Not Available 8 2 2 1 
Histology 
Favorable 4 (2%) 29 (23%) 95 (54%) 9 (8%) 
Unfavorable 187 (98%) 96 (77%) 82 (46%) 107 (92%) 
Not Available 31 11 6 11 
DNA Index 
Hyperdiploid 117 (54%) 81 (61%) 121 (67%) 67 (53%) 
Diploid 98 (46%) 52 (39%) 59 (33%) 59 (47%) 
Not Available 7 3 3 1 
Risk 
Low 0 (0%) 15 (11%) 64 (35%) 0 (0%) 
Intermediate 0 (0%) 14 (10%) 49 (27%) 6 (5%) 
High 219 (100%) 106 (79%) 69 (38%) 121 (95%) 
Not Available 3 1 1 0 


*There is an overlap of 59 neuroblastoma patients with both whole-exome and whole-genome sequencing. Patients with targeted sequencing are all unique and do not overlap with whole-exome or 
whole-genome cases, yielding 482 unique patients with exonic DNA sequencing of LMO1. 
tinternational Neuroblastoma Staging System (INSS). 
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Extended Data Table 3 | Association of rs2168101 with clinical/biological co-variates 


rs2168101 genotypes’ 


Clinical/Biological 


Co-variate GG 
tage 8 
Not Stage 4 611 (56%) 
MYCN Amplified 183 (55%) 
MYCN Non-Amplified 881 (59%) 
High-Risk 523 (63%) 
Not High-Risk 594 (56%) 
Unfavorable Histology 454 (61%) 
Favorable Histology 527 (57%) 
DNA Index Hyperdiploid 685 (59%) 
DNA Index Diploid 324 (57%) 
Age >= 18 mos 621 (61%) 
Age < 18 mos 529 (57%) 


*Reverse strand hg19, imputed genotypes from IMPUTE2, genotype frequencies as reported by SNPTEST. 
TSNPTEST, frequentist score test with additive model, adjusted for gender and top 20 MDS components. 


tinternational Neuroblastoma Staging System. 


GT 


/o, 


400 (37%) 


115 (34%) 
525 (35%) 


263 (32%) 
398 (37%) 


287 (32%) 
336 (36%) 


412 (35%) 
198 (35%) 


346 (34%) 
338 (36%) 


TT 


/o 


74 (7%) 


36 (11%) 
83 (6%) 


47 (6%) 
73 (7%) 


48 (6%) 
62 (7%) 


71 (6%) 
43 (8%) 


55 (5%) 
68 (7%) 
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Association result 


P-Value’ 


0.00297 


0.00174 


0.14479 


0.32009 


0.01448 


Odds Ratio‘ 


1.39 (1.12-1.73) 


0.76 (0.65-0.90) 


0.88 (0.73-1.05) 


0.91 (0.76-1.09) 


0.82 (0.69-0.96) 
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A mechanism for the suppression of homologous 


recombination in G1 cells 


Alexandre Orthwein!, Sylvie M. Noordermeer!*, Marcus D. Wilson!, Sébastien Landry!, Radoslav I. Enchev?, Alana Sherker!?, 
Meagan Munro!, Jordan Pinder*, Jayme Salsman‘, Graham Dellaire*, Bing Xia°, Matthias Peter? & Daniel Durocher!? 


DNA repair by homologous recombination! is highly suppressed 
in G1 cells”? to ensure that mitotic recombination occurs 
solely between sister chromatids*. Although many homologous 
recombination factors are cell-cycle regulated, the identity of 
the events that are both necessary and sufficient to suppress 
recombination in G1 cells is unknown. Here we report that the 
cell cycle controls the interaction of BRCA1 with PALB2-BRCA2 
to constrain BRCA2 function to the S/G2 phases in human cells. 
We found that the BRCA1-interaction site on PALB2 is targeted by 
an E3 ubiquitin ligase composed of KEAP1, a PALB2-interacting 
protein®, in complex with cullin-3 (CUL3)-RBX1 (ref. 6). PALB2 
ubiquitylation suppresses its interaction with BRCA1 and is 
counteracted by the deubiquitylase USP11, which is itself under 
cell cycle control. Restoration of the BRCA1-PALB2 interaction 
combined with the activation of DNA-end resection is sufficient to 
induce homologous recombination in G1, as measured by RAD51 
recruitment, unscheduled DNA synthesis and a CRISPR-Cas9- 
based gene-targeting assay. We conclude that the mechanism 
prohibiting homologous recombination in G1 minimally consists 
of the suppression of DNA-end resection coupled with a multi- 
step block of the recruitment of BRCA2 to DNA damage sites 
that involves the inhibition of BRCA1-PALB2-BRCA2 complex 
assembly. We speculate that the ability to induce homologous 
recombination in G1 cells with defined factors could spur the 
development of gene-targeting applications in non-dividing cells. 

The breast and ovarian tumour suppressors BRCA1, PALB2 and 
BRCA2 promote DNA double-strand break (DSB) repair by homol- 
ogous recombination”-*. BRCA1 promotes DNA-end resection to 
produce the single-stranded (ss)DNA necessary for homology search 
and strand invasion! and it also interacts with PALB2 (refs 10-12) to 
direct the recruitment of BRCA2 (ref. 10) and RAD51 (refs 13 and 
14) to DSB sites. The accumulation of BRCA1 on the chromatin that 
flanks DSB sites is suppressed in G1 cells', reminiscent of the potent 
inhibition of homologous recombination in this phase of the cell cycle. 
Since the inhibition of BRCA1 recruitment in G1 is dependent on the 
53BP1 and RIF1 proteins!>1¢, two inhibitors of end resection!>~!, this 
regulation of BRCA1 was originally viewed in light of its function in 
DNA-end processing. 

However, as BRCA1 is also involved in promoting the recruitment 
of BRCA2 through its interaction with PALB2, we asked whether 
inducing BRCA1 recruitment to DSB sites in G1, through mutation 
of 53BP1 (also known as TP53BP1) by genome editing (53BP1A; 
Extended Data Fig. la—c) also resulted in BRCA2 accumulation into 
ionizing-radiation-induced foci. To our surprise, and in contrast with 
BRCA1, we found that neither BRCA2 nor PALB2 are recruited to 
G1 DSB sites in U2OS cells lacking 53BP1 at ionizing radiation doses 
ranging from 2 to 20 Gy (Fig. la, b and Extended Data Fig. 1d, e). 


Since BRCA1 and PALB2 interact directly'®", this result suggested that 
G1 cells may block BRCA2 recruitment by suppressing the BRCA1- 
PALB2 interaction. Indeed, while PALB2 interacts with BRCA2 irre- 
spective of cell cycle position, it interacts efficiently with BRCA1 only 
during S phase (Fig. 1c). The presence of DNA damage led to the loss 
of the residual PALB2-BRCA1 interaction in G1 whereas it had little 
impact on the assembly of the BRCA1-PALB2-BRCA2 complex in 
S phase (Fig. 1c). Since all proteins were expressed in G1 (Fig. Ic), 
our results suggest that the assembly of the BRCA1-PALB2-BRCA2 
complex is controlled during the cell cycle, possibly to restrict the accu- 
mulation of BRCA2 at DSB sites to the S/G2 phases. 

We confirmed these results using a single-cell assay assessing the 
colocalization, at an integrated lacO array”, of an mCherry-tagged 
LacR-BRCA1 fusion protein with GFP-tagged PALB2 (Extended Data 
Fig. 2a). This LacR/IacO system recapitulated the cell-cycle-dependent 
and DNA-damage-sensitive BRCA1-PALB2 interaction (Extended 
Data Fig. 2b) and enabled us to determine that sequences on PALB2, 
located outside its amino-terminal BRCA1-interaction domain 
(residues 1-50) were responsible for the cell-cycle-dependent regula- 
tion of its association with BRCA1 (Extended Data Fig. 2c, d). Further 
deletion mutagenesis identified a single region, encompassed within 
residues 46-103 in PALB2 (Extended Data Fig. 2e, f) responsible for the 
cell-cycle-dependent regulation of the BRCA1-PALB2 interaction. This 
region corresponds to the interaction site for KEAP1 (ref. 5), identifying 
this protein as a candidate regulator of the BRCA1-PALB2 interaction. 

KEAP1 is a substrate adaptor for a CUL3-RING ubiquitin (Ub) 
ligase (CRL3) that targets the antioxidant regulator NRF2 for protea- 
somal degradation”! and recognizes an ‘ETGE’ motif on both PALB2 
and NRF2 through its KELCH domain’. Depletion of KEAP1 from 
53BP1A cells, or deletion of the ETGE motif in full-length PALB2 
(PALB2 AETGE) induced PALB2 ionizing-radiation-induced focus 
formation in G1 cells (Fig. 1d and Extended Data Fig. 3a). Furthermore, 
in cells in which KEAP1 was inactivated by genome editing (KEAP1A; 
Extended Data Fig. 3b) we detected a stable BRCA1-PALB2-BRCA2 
complex in both G1 and S phases (Fig. le). KEAP1 is therefore an 
inhibitor of the BRCA1-PALB2 interaction. 

CUL3 also interacts with PALB2 (Extended Data Fig. 3c), and its 
depletion in 53BP1A U2OS cells de-repressed PALB2 ionizing- 
radiation-induced foci in G1 (Fig. 1d and Extended Data Fig. 3a). 
Furthermore, in Gl-synchronized KEAP1A cells, expression of a 
CUL3-binding-deficient KEAP1 protein that lacks its BTB domain 
(ABTB) failed to suppress the BRCA1-PALB2 interaction, unlike its 
wild-type counterpart (Extended Data Fig. 3d). These results suggest 
that KEAP1 recruits CUL3 to PALB2 to suppress its interaction with 
BRCAI. 

Using the LacR/lacO system and co-immunoprecipitation assays, 
we found that a mutant of PALB2 lacking all eight lysine residues in 
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Figure 1 | Inhibition of the BRCA1-PALB2 interaction in G1 is 
CRL3-KEAP1-dependent. a, Micrographs of irradiated (2 Gy) 
G1-synchronized U2OS cells processed for y-H2AX, BRCA1 and 
BRCA2 immunofluorescence. DAPI, 4’,6-diamidino-2-phenylindole; 
IR, ionizing radiation; WT, wild type. b, Quantitation of the experiment 
shown in a and Extended Data Fig. 1d. ASN, asynchronously dividing. 
Mean + standard deviation (s.d.), N=3.c, Immunoprecipitation (IP) of 
PALB2 from extracts prepared from mock- or X-irradiated 293T cells 
synchronized in S or G1 phases. A normal immunoglobulin (Ig)G 
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Figure 2 | Ubiquitylation of PALB2 prevents BRCA1-PALB2 
interaction. a, Sequence of the PALB2 N terminus and mutants. 

b, GFP immunoprecipitation (IP) of extracts derived from G1- or S-phase- 
synchronized 293T cells expressing the indicated GFP-PALB2 proteins. 

c, In vitro ubiquitylation of the indicated HA-tagged PALB2 proteins by 
CRL3-KEAP1. d, Pulldown assay of ubiquitylated HA-PALB2 (1-103) 
incubated with MBP or MBP-BRCA1-CC. I, input; FT, flow-through; PD, 
pulldown. The asterisk denotes a fragment of HA~PALB2 competent for 
BRCAI binding. b-d, Numbers on left indicate kDa. 


the BRCA1-interaction domain (PALB2-KR,; Fig. 2a) could interact 
with BRCA1 irrespective of cell cycle position (Fig. 2b and Extended 
Data Fig. 3e, f). Further mutagenesis identified residues 20, 25 and 30 
in PALB2 as critical for the suppression of the BRCA1-PALB2 interac- 
tion, since reintroduction of these lysines in the context of PALB2-KR 
(yielding PALB2-KR/K3; Fig. 2a) led to the suppression of BRCA1- 
PALB2-BRCA2 complex assembly in G1 cells (Fig. 2b and Extended 


293T cells +IR (20 Gy) 


immunoprecipitation was performed as control. Cyclin A staining 
ascertains cell cycle synchronization. Numbers on left indicate kDa. 

For gel source data, see Supplementary Fig. 1. d, Quantitation of the 
experiment shown in Extended Data Fig. 3a. 53BP1A U20S cells 
transfected with the indicated GFP-PALB2 vectors and short interfering 
(si)RNAs were irradiated (20 Gy) before being processed for microscopy 
(mean + s.d., N= 3). e, Normal IgG and PALB2 immunoprecipitations 
from extracts prepared from synchronized and irradiated 293T cells of the 
indicated genotypes. Numbers on left indicate kDa. 


Data Fig. 3e). Together, these results suggested a model whereby 
PALB2-bound KEAP1 forms an active CRL3 complex that ubiquitylates 
the PALB2 N terminus to suppress its interaction with BRCA1. 

While PALB2 ubiquitylation can be detected in cells (Extended 
Data Fig. 4a), the lysine-rich nature of the PALB2 N terminus has so 
far precluded us from unambiguously mapping in vivo ubiquitylation 
sites on Lys 20, 25 or 30. However, we could detect ubiquitylation on 
Lys 16 and Lys 43 by mass spectrometry, indicating that the PALB2 N 
terminus is ubiquitylated (Extended Data Fig. 4b). In a complemen- 
tary set of experiments, PALB2 targeted to the /acO array induced 
immunoreactivity to conjugated Ub (Extended Data Fig. 4c-e). Ub 
colocalization with PALB2 was highest in G1, and depended on the 
KEAP1-interaction motif and the presence of the Lys 20/25/30 resi- 
dues (Extended Data Fig. 4d-e), consistent with the model that PALB2 
is ubiquitylated on those sites in G1 cells. Indeed, we could readily 
reconstitute ubiquitylation of the N terminus of PALB2 (residues 1-103; 
fused to a haemagglutinin (HA) epitope tag), by recombinant CRL3- 
KEAP1, in a manner that depended on the KEAP1-interaction domain 
of PALB2 (Fig. 2c), and we unambiguously identified Lys 25 and Lys 
30 as being ubiquitylated by KEAP1 in vitro by mass spectrometry 
(Extended Data Fig. 5). 

Ubiquitylation of PALB2 by CRL3-KEAP1 inhibited its interaction 
with a BRCA1 fragment comprising residues 1363-1437 (BRCA1-CC), 
an inhibition that was more obvious with the highly modified forms 
of PALB2 owing to the presence of ubiquitylated lysines outside the 
BRCA1-interaction domain (Fig. 2d). To test specifically whether ubiq- 
uitylation ofa single lysine residue (of the three identified as critical) 
inhibited the interaction with BRCA1, we used chemical crosslinking 
to install a single Ub moiety at position 20 or 45 (yielding PALB2- 
Kc20-Ub and PALB2-Kc45-Ub, respectively). Ubiquitylation of PALB2 
at position 20 completely suppressed its interaction with BRCA1 
whereas modification of residue 45 had no impact on the interaction 
(Extended Data Fig. 6a), echoing the in vivo data (Extended Data 
Fig. 3e). Together, these results indicate that ubiquitylation of PALB2 
at specific sites on its N terminus prevents its interaction with BRCA1. 
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Figure 3 | USP11 opposes the activity of CRL3-KEAP1. a, Normal IgG or 
PALB2 immunoprecipitation (IP) of extracts derived from camptothecin 
(CPT)-treated 293T cells of the indicated genotypes transfected with 
GFP-USP11 constructs. EV, empty vector; CS, C3188; WT, wild type. 

b, Clonogenic survival assays of 293T cells of the indicated genotypes 
treated with olaparib (mean + s.d., N> 3). c, Normal IgG or PALB2 
immunoprecipitation of extracts derived from CPT-treated 293T cells of 
the indicated genotypes. d, Immunoblots of deubiquitylation reactions 
containing ubiquitylated HA-tagged PALB2 (1-103) and increasing 
concentrations of glutathione S-transferase (GST)-USP11 or its C270S 
(CS) mutant. USP2 was used as a control. DUB, deubiquitylase. e, Cell- 
cycle-synchronized U2OS cells were irradiated (20 Gy dose) and processed 
for immunoblotting. IR, ionizing radiation. f, Immunoblots of extracts 
from irradiated U2OS cells transfected with the indicated siRNAs. CTRL, 
control. g, Fluorescence micrographs of Gl-synchronized and irradiated 
(20 Gy) 53BP1A U20S cells transfected with the indicated siRNAs. The 
percentage of cells with more than five y-H2AX-colocalizing BRCA2 foci 
is indicated (mean + s.d., N= 3). Scale bars, 51m. a, c, d, f, Numbers to 
left or right indicate kDa. 


Since neither the activity of the CRL3-KEAP1 E3 ligase (Extended 
Data Fig. 6b) nor the interaction of CRL3-KEAP1 with PALB2 
(Extended Data Fig. 3c) are regulated by the cell cycle, we considered 
the possibility that deubiquitylation of PALB2 might be regulated in a 
cell-cycle-dependent manner. KEAP1 physically interacts with USP11 
(ref. 22), a deubiquitylase that also interacts with BRCA2 (ref. 23) and 
PALB2 (Extended Data Fig. 6c). USP11 depletion impairs gene conver- 
sion” (Extended Data Fig. 6d) and results in hypersensitivity to PARP 
inhibition”, identifying it as a homologous recombination regulator of 
unknown function. Co-immunoprecipitation experiments confirmed 
that USP11 and its catalytic activity were necessary for the formation of 
a stable BRCA1-PALB2-BRCA2 complex, especially in the presence of 
DNA damage (Fig. 3a and Extended Data Fig. 6e, f). 

If USP11 antagonizes PALB2 ubiquitylation by CRL3-KEAP1, 
then removal of KEAP1 (or CUL3) should reverse the phenotypes 
imparted by loss of USP11. Indeed, deletion of KEAP1 restored resist- 
ance to PARP inhibitors (PARPi) and the BRCA1-PALB2 interaction 
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in USP11-knockout cells prepared by genome editing (USP11A; 
Fig. 3b, c and Extended Data Fig. 6e). Likewise, depletion of CUL3 
or KEAP1 reversed the gene conversion defect of USP11-depleted 
cells (Extended Data Fig. 7a). Introduction of the PALB2-KR mutant 
restored its interaction with BRCA1 and reversed PARPi sensitivity in 
USP11A cells in a manner that depended on Lys 20/25/30 (Extended 
Data Fig. 7b, c). Since recombinant USP11 can de-ubiquitylate PALB2 
(1-103) in vitro (Fig. 3d), these results suggest that USP11 promotes 
the assembly of the BRCA1-PALB2-BRCA2 complex by reversing the 
inhibitory ubiquitylation on the PALB2 Lys 20/25/30 residues. 

We observed that USP11 turns over rapidly in G1 cells and inter- 
acts poorly with PALB2 in this phase of the cell cycle (Extended Data 
Fig. 8a, b). Furthermore, there is a rapid loss of USP11 upon DNA 
damage induction, specifically in G1 phase (Fig. 3e and Extended Data 
Fig. 8b, c). The destabilization of USP11 after ionizing radiation treat- 
ment is dependent on ATM signalling, whereas it is ATR-dependent 
after ultraviolet irradiation (Extended Data Fig. 8d, e). The drop in 
USP11 steady-state levels in G1 is the result of proteasomal degradation 
(Extended Data Fig. 8f). A CRL4 E3 Ub ligase is probably responsible 
for controlling the stability of USP11, as treatment with MLN4924, a 
pan-CRL inhibitor’® (Extended Data Fig. 8g), or depletion of CUL4 
(Fig. 3f), protected USP11 from DNA-damage-induced degradation. 
CUL4 depletion led to BRCA2 and PALB2 ionizing-radiation-induced 
focus formation in G1 53BP1A cells (Fig. 3g and Extended Data 
Fig. 9a), consistent with the regulation of USP11 by a CRL4 complex 
acting as the upstream signal that ultimately controls BRCA1-PALB2- 
BRCA2 complex assembly. 

While deletion of 53BP1 produces low levels of ssDNA in G1 cells”, 
combining the 53BP1A mutation with depletion of KEAP1 did not pro- 
duce extraction-resistant RAD51 ionizing-radiation-induced foci, sug- 
gesting little-to-no RAD51 nucleofilament formation (Extended Data 
Fig. 9b). We surmised that ssDNA formation remained insufficient 
in those cells and thus took advantage of the phosphomimetic T847E 
mutant of CtIP that promotes resection in G1 cells”’. Unlike wild- 
type CtIP, introduction of CtIP(T847E) into 53BP1A cells depleted of 
KEAP!1 induced RADS51 ionizing-radiation-induced focus formation in 
G1 cells (Fig. 4a, b and Extended Data Fig. 9b, c) along with unsched- 
uled DNA synthesis (Extended Data Fig. 9d). These results suggested 
that the steps downstream of RADS51 nucleofilament formation, that 
is, strand invasion, D-loop formation and DNA synthesis, could be 
activated in Gl. 

To test whether productive homologous recombination could also 
be activated in G1, we employed a CRISPR-Cas9-stimulated gene- 
targeting assay”® in which the insertion of the coding sequence for 
the mClover fluorescent protein at the 5’ end of the lamin A (LMNA) 
or PML genes was monitored by microscopy or flow cytometry 
(Fig. 4c and Extended Data Fig. 9e, f), with the latter method enabling 
the gating of cells with a defined DNA content (such as G1 cells). We 
also established synchronization protocols in which G1 cells obtained 
after release from a thymidine block were arrested in G1 by lovastatin 
treatment’ for 24h (Extended Data Fig. 9g, h). Using this system, we 
determined a concentration of donor template in the linear range of 
the assay and ascertained that gene targeting at the LMNA locus was 
dependent on BRCA1-PALB2-BRCA2 complex assembly (Extended 
Data Fig. 10a, b). We also confirmed that gene targeting by homologous 
recombination was highly suppressed in G1 (Fig. 4d). 

The combined activation of resection and BRCA1 recruitment to 
DSB sites (that is, in 53BP1A cells expressing CtIP(T847E)) was insuf- 
ficient to stimulate gene targeting at either the LMNA or the PML locus 
in G1 cells (Fig. 4e and Extended Data Fig. 10c, d). However, when the 
BRCA1-PALB2 interaction was restored in resection-competent G1 
cells using either KEAP1 depletion or expression of the PALB2-KR 
mutant, we detected a robust increase in gene-targeting events at both 
loci (Fig. 4e and Extended Data Fig. 10c, d). We note, however, that the 
gene-targeting frequencies of G1 cells remained lower than those of 
asynchronously dividing cells, suggesting an incomplete activation of 
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Figure 4 | Reactivation of homologous recombination in G1. 

a, Quantitation of wild-type (WT) and 53BP1A U20S cells co-transfected 
with non-targeting (CTRL) or KEAP1 siRNAs and vectors expressing 
wild-type CtIP or the T847E (TE) mutant that were synchronized 

in G1, irradiated (2 Gy) and processed for \-H2AX and RADS51 
immunofluorescence (mean + s.d., N= 3). b, Representative micrographs 
from a. IR, ionizing radiation. c, Schematic of the gene-targeting assay. 

d, Gene-targeting efficiency at the LMNA locus in asynchronously 
dividing (ASN) and G1-arrested U2OS cells (mean + s.d., N= 3). HR, 
homologous recombination; sgRNA, single guide RNA. e, Gene targeting 
at the LMNA locus in G1-arrested cells transfected with the indicated 
siRNA or a PALB2-KR expression vector (mean + s.d., N= 3). f, Model 
of the cell-cycle regulation of homologous recombination. 


homologous recombination. 53BP1 inactivation and the expression of 
CtIP(T847E) were both necessary for G1 homologous recombination 
(Extended Data Fig. 10e, f), indicating that the simultaneous activa- 
tion of end resection and BRCA2 recruitment to DSB sites were both 
necessary and sufficient to activate unscheduled recombination in this 
phase of the cell cycle. 

We conclude that the regulation of BRCA1-PALB2-BRCA2 com- 
plex assembly is a key node in the cell cycle control of DSB repair by 
homologous recombination. This regulation converges on the BRCA1- 
interaction site on PALB2 and is enforced by the opposing activities 
of the E3 ligase CRL3-KEAP1 and the deubiquitylase USP11, with 
the latter being antagonized in G1 by a CRL4 complex (Fig. 4f). Our 
studies also demonstrate that the suppression of homologous recom- 
bination in G1 cells is largely reversible and that it involves the com- 
bined suppression of end resection and BRCA2 recruitment to DSB 
sites (Fig. 4f). As most cells in the human body are not actively cycling 
and are thus refractory to homologous recombination, the manipula- 
tions described here may eventually enable therapeutic gene targeting 
in a wide variety of tissues. However, these approaches may necessitate 
the reversal of additional blocks to gene targeting such as the potential 
downregulation of homologous recombination factor expression in 
post-mitotic cells. 
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METHODS 

Plasmids. The complementary DNA of PALB2 was obtained from the Mammalian 
Gene Collection (MGC). Full-length PALB2 and BRCA1 were amplified by PCR, 
subcloned into pDONR221 and delivered into the pDEST-GFP, pDEST-Flag 
and the mCherry-LacR vectors using Gateway cloning technology (Invitrogen). 
Similarly, the coiled-coil domain of BRCA1 (residues 1363-1437) was amplified 
by PCR, subcloned into the pDONR221 vector and delivered into both mCherry- 
LacR and pDEST-GFP vectors. The N-terminal domain of PALB2 was amplified 
by PCR and introduced into the GST expression vector pET30-2-His-GST-TEV” 
using the EcoRI/Xhol sites. The coiled-coil domain of BRCA1 was cloned into 
pMAL-c2 using the BamHI/Sall sites. Truncated forms of PALB2 were obtained 
by introducing stop codons or deletions through site-directed mutagenesis. Full- 
length CtIP was amplified by PCR, subcloned into the pDONR221 and delivered 
into the lentiviral construct pCW57.1 (a gift from D. Root; Addgene plasmid 
#41393) using Gateway cloning technology (Invitrogen). The USP11 cDNA was 
a gift from D. Cortez and was amplified by PCR and cloned into the pDsRed2-C1 
vector using the EcoRI/Sall sites. The bacterial codon-optimized coding sequence 
of pig USP11 was subcloned into the 6 x His-GST vector pETM-30-Htb using 
the BamHI/EcoRI sites. siRNA-resistant versions of PALB2, BRCAI1 and USP11 
constructs were generated as previously described''. Full-length CUL3 and RBX1 
were amplified by PCR from a human pancreas cDNA library (Invitrogen) as 
previously described” and cloned into the dual expression pFBDM vector using 
NhelI/Xmal and BssHII/NotlI respectively. The NEDD8 cDNA was a gift from D. 
Xirodimas and was fused to a double StrepII tag at its C terminus in the pET17b 
vector (Millipore). Human DEN1 was amplified from a vector supplied by A. 
Echalier and fused to a non-cleavable N-terminal StrepII2x tag by PCR and 
inserted into a pET17b vector. The pPCOOL-mKEAP1 plasmid was a gift from 
E Shao. The pcDNA3-HA2-KEAP1 and pcDNA3-HA2-KEAP1ABTB were gifts 
from Y. Xiong (Addgene plasmids #21556 and 21593). gRNAs were synthesized 
and processed as described previously*!. Annealed gRNAs were cloned into the 
Cas9-expressing vectors pSpCas9(BB)-2A-Puro (PX459) or pX330-U6-Chimeric_ 
BB-CBh-hSpCas9, a gift from F. Zhang (Addgene plasmids #48139 and 42230). The 
gRNAs targeting the LMNA or the PML locus and the mClover-tagged LMNA or 
PML are described previously”*. The lentiviral packaging vector psPAX2 and the 
envelope vector VSV-G were a gift from D. Trono (Addgene plasmids #12260 and 
12259). Hiss-Ub was cloned into the pcDNA5-FRT/TO backbone using the Xhol/ 
HindIII sites. All mutations were introduced by site-directed mutagenesis using 
QuikChange (Stratagene) and all plasmids were sequence-verified. 

Cell culture and plasmid transfection. All culture media were supplemented with 
10% fetal bovine serum (FBS). U-2-OS (U2OS) cells were cultured in McCoy’s 
medium (Gibco). 293T cells were cultured in DMEM (Gibco). Parental cells were 
tested for mycoplasma contamination and authenticated by STR DNA profiling. 
Plasmid transfections were carried out using Lipofectamine 2000 Transfection 
Reagent (Invitrogen) following the manufacturer's protocol. Lentiviral infection 
was carried out as previously described!°. U2OS and 293T cells were purchased 
from ATCC. U2OS 256 cells were a gift from R. Greenberg. 

Antibodies. We employed the following antibodies: rabbit anti-53BP1 (A300- 
273A, Bethyl), rabbit anti-53BP1 (sc-22760, Santa Cruz), mouse anti-53BP1 
(#612523, BD Biosciences), mouse anti-y-H2AX (clone JBW301, Millipore), 
rabbit anti--/-H2AX (#2577, Cell Signaling Technologies), rabbit anti-KEAP1 
(ab66620, Abcam), rabbit anti- NRF2 (ab62352, Abcam), mouse anti-Flag 
(clone M2, Sigma), mouse anti-tubulin (CP06, Calbiochem), mouse anti-GFP 
(411814460001, Roche), mouse anti-CCNA (MONX10262, Monosan), rabbit 
anti-BRCA2 (ab9143, Abcam), mouse anti-BRCA2 (OP95, Calbiochem), rabbit 
anti-BRCA1 (#07-434, Millipore), rabbit anti- USP11 (ab109232, Abcam), rabbit 
anti-USP11 (A301-613A, Bethyl), rabbit anti-RAD51 (#70-001, Bioacademia), 
mouse anti-BrdU (RPN202, GE Healthcare), mouse anti-FK2 (BML-PW8810, 
Enzo), rabbit anti-PALB2 (ref. 32), rabbit anti-GST (sc-459, Santa Cruz), rabbit 
anti-CUL3 (A301-108A, Bethyl), mouse anti- MBP (E8032, NEB), mouse anti-HA 
(clone 12CA5, a gift from M. Tyers), rabbit anti-ubiquitin (Z0458, Dako) and 
mouse anti-actin (CP01, Calbiochem). The following antibodies were used as 
secondary antibodies in immunofluorescence microscopy: Alexa Fluor 488 donkey 
anti-rabbit IgG, Alexa Fluor 488 donkey anti-goat IgG, Alexa Fluor 555 don- 
key anti-mouse IgG, Alexa Fluor 555 donkey anti-rabbit IgG, Alexa Fluor 647 
donkey anti-mouse IgG, Alexa Fluor 647 donkey anti-human IgG, Alexa Fluor 647 
donkey anti-goat IgG (Molecular Probes). 

RNA interference. All siRNAs employed in this study were single duplex siR- 
NAs purchased from ThermoFisher. RNA interference (RNAi) transfections were 
performed using Lipofectamine RNAiMax (Invitrogen) in a forward transfection 
mode. The individual siRNA duplexes used were: BRCA1 (D-003461-05), PALB2 
(D-012928-04), USP11 (D-006063-01), CULI (M-004086-01), CUL2 (M-007277- 
00), CUL3 (M-010224-02), CUL4A (M-012610-01), CUL4B (M-017965-01), CULS 
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(M-019553-01), KEAP1 (D-12453-02), RAD51 (M-003530-04), CtIP/RBBP8 
(M-001376-00), BRCA2 (D-003462-04), 53BP1 (D-003549-01) and non-targeting 
control siRNA (D-001210-02). Except when stated otherwise, siRNAs were trans- 
fected 48 h before cell processing. 

Inhibitors and fine chemicals. We employed the following drugs at the indi- 
cated concentrations: cycloheximide (CHX; Sigma) at 100ng ml“, camptothecin 
(CPT; Sigma) at 0.2 1M, ATM inhibitor (KU55933; Selleck Chemicals) at 101M, 
ATR inhibitor (VE-821; a gift from P. Reaper) at 1011M, DNA-PKcs inhibitor 
(NU7441; Genetex) at 10,1M, proteasome inhibitor MG132 (Sigma) at 2\1M, 
lovastatin (S2061; Selleck Chemicals) at 40|1M, doxycycline (#8634-1; Clontech), 
Nedd8-activating enzyme inhibitor (MLN4929; Active Biochem) at 541M and 
olaparib (Selleck) at the indicated concentrations. 

Immunofluorescence microscopy. In most cases, cells were grown on glass 
coverslips, fixed with 2% (w/v) paraformaldehyde in PBS for 20 min at room 
temperature, permeabilized with 0.3% (v/v) Triton X-100 for 20 min at room 
temperature and blocked with 5% BSA in PBS for 30 min at room temperature. 
Alternatively, cells were fixed with 100% cold methanol for 10 min at —20°C and 
subsequently washed with PBS for 5 min at room temperature before PBS-BSA 
blocking. Cells were then incubated with the primary antibody diluted in PBS-BSA 
for 2h at room temperature. Cells were next washed with PBS and then incubated 
with secondary antibodies diluted in PBS-BSA supplemented with 0.8,.g ml! of 
DAPI to stain DNA for 1h at room temperature. The coverslips were mounted 
onto glass slides with Prolong Gold mounting agent (Invitrogen). Confocal images 
were taken using a Zeiss LSM780 laser-scanning microscope. For G1 versus S/ 
G2 analysis of the BRCA1-PALB2-BRCA2 axis, cells were first synchronized 
with a double-thymidine block, released to allow entry into S phase and exposed 
to 2 or 20 Gy of X-irradiation at 5h and 12h post-release and fixed at 1 to 5h 
post-treatment (where indicated). For the examination of DNA replication, cells 
were pre-incubated with 301M BrdU for 30 min before irradiation and processed 
as previously described. 

CRISPR-Cas9 genome editing of USP11/KEAP1. 293T and U2OS cells were 
transiently transfected with three distinct sgRNAs targeting either 53BP1, USP11 
or KEAP1 and expressed from the pX459 vector containing Cas9 followed by the 
2A-Puromycin cassette. The next day, cells were selected with puromycin for 2 days 
and subcloned to form single colonies or subpopulations. Clones were screened 
by immunoblot and/or immunofluorescence to verify the loss of 53BP1, USP11 or 
KEAP!1 expression and subsequently characterized by PCR and sequencing. The 
genomic region targeted by the CRISPR-Cas9 was amplified by PCR using Turbo 
Pfu polymerase (Agilent) and the PCR product was cloned into the pCR2.1 TOPO 
vector (Invitrogen) before sequencing. 

Olaparib clonogenic assay. 293T cells were incubated with the indicated doses of 
olaparib (Selleck Chemicals) for 24h, washed once with PBS and counted by trypan 
blue staining. Five-hundred cells were then plated in duplicate for each condition. 
The cell survival assay was performed as previously described*’. 

Recombinant protein production. GST and MBP fusions proteins were produced 
as previously described**“>. Briefly, MBP proteins expressed in Escherichia coli were 
purified on amylose resin (New England Biolabs) according to the batch method 
described by the manufacturer and stored in 1x PBS, 5% glycerol. GST proteins 
expressed in E. coli were purified on glutathione sepharose 4B (GE Healthcare) 
resin in 50mM Tris HCl pH 7.5, 300mM NaCl, 2mM dithiothreitol (DTT), 
1mM EDTA, 15;.gml~! AEBSF and 1x complete protease inhibitor cocktail 
(Roche). Upon elution from the resin using 50 mM glutathione in 50 mM Tris 
HCl pH 8, 2mM DTT, the Hisg-GST tag was cleaved off using His-tagged TEV 
protease (provided by F. Sicheri) in 50mM Tris HCl pH 7.5, 150mM NaCl, 10mM 
glutathione, 10% glycerol, 2mM sodium citrate and 2mM (-mercaptoethanol. 
His-tagged proteins were depleted using Ni- NTA-agarose beads (Qiagen) in 
50mM Tris HCl pH 7.5, 300mM NaCl, 20 mM imidazole, 5mM glutathione, 
10% glycerol, 1mM sodium citrate and 2mM }-mercaptoethanol followed by 
centrifugal concentration (Amicon centrifugal filters, Millipore). GST-mKEAP1 
was purified as described previously*, with an additional anion exchange step on 
a HiTrap Q HP column (GE Healthcare). The GST tag was left on the protein for 
in vitro experiments. Purification of CUL3 and RBX1 was performed as previ- 
ously described*”. NEDD8 (gift from D. Xirodimas) and DEN1 were expressed 
in E. coli BL21 grown in Terrific broth media and induced overnight with 0.5 mM 
isopropyl-$-p-thiogalactoside (IPTG) at 16°C. Cells were harvested and resus- 
pended in wash buffer (400 mM NaCl, 50 mM Tris-HCl, pH 8, 5% glycerol, 2mM 
DTT), supplemented with lysozyme, universal nuclease (Pierce), benzamidine, 
leupeptin, pepstatin, PMSF and complete protease inhibitor cocktail (Roche), 
except for DEN1-expressing cells where the protease inhibitors were omitted. 
Cells were lysed by sonication and the lysate was cleared by centrifugation at 
20,000 r.p.m. for 50 min. The soluble supernatant was bound to a 5 ml Strep- 
Tactin Superflow Cartridge with a flow rate of 3 ml min“! using a peristaltic pump. 
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The column was washed with 20 column volumes (CV) of washing buffer and 
eluted with 5 CV washing buffer, diluted 1:2 in water to reduce the final salt 
concentration, and supplemented with 2.5mM desthiobiotin. The elution frac- 
tions were pooled and concentrated to a total volume of 4 ml using a 3 kDa cut-off 
Amicon concentrator. DEN1 was further purified over a Superdex 75 size-exclusion 
column, buffer exchanged into 150 mM NaCl, HEPES, pH 7.6, 2% glycerol and 
1mM DTT. The C-terminal pro-peptide and StrepII2 x -tag were removed by incu- 
bation with StrepI12x-DEN1 in a 1:20 molar ratio for 1h at room temperature. 
The DENI cleavage reaction was buffer exchanged on a Zeba MWCO desalting 
column (Pierce), to remove the desthiobiotin, and passed through a Strep-Tactin 
Cartridge, which retains the C-terminal pro-peptide and DEN1. The GST-tagged 
Sus scrofa (pig) USP11 proteins were expressed in E. coli as described*”. Cells were 
lysed by lysozyme treatment and sonication in 50mM Tris pH 7.5, 300mM NaCl, 
1mM EDTA, I1mM AEBSF, 1x Protease Inhibitor mix (284ng ml! leupeptin, 
1.37j.gml pepstatin A, 170;.g ml! PMSF and 330j1g ml! benzamidine) and 5% 
glycerol. Cleared lysate was applied to a column packed with glutathione sepharose 
4B (GE Healthcare), washed extensively with lysis buffer before elution in 50 mM 
Tris pH 7.5, 150mM NaCl, 5% glycerol and 25 mM reduced glutathione. DUB 
activity was assayed on fluorogenic ubiquitin- AMC (Enzo life sciences), measured 
using a Synergy Neo microplate reader (Biotek). Hisg-TEV-ubiquitin-G76C was 
purified on chelating HiTrap resin, following the manufacturer’s instructions, 
followed by size-exclusion chromatography on a S-75 column (GE healthcare). 
The protein was extensively dialysed in 1 mM acetic acid and lyophilized. 

In vitro ubiquitylation and deubiquitylation of PALB2. HA-tagged N-terminal 
fragments of PALB2 (1-103) (11M) were in vitro ubiquitylated using 501M 
wild-type (Ubi WT, Boston Biochem) or a lysine-less ubiquitin (Ub-K0, Boston 
Biochem), 100nM human UBAI (E1), 500 nM CDC34 (provided by F. Sicheri 
and D. Ceccarelli), 250nM neddylated CUL3/RBX1, 375 nM GST-mKEAP1 and 
1.5mM ATP ina buffer containing 50 mM Tris HCl pH 7.5, 20mM NaCl, 10mM 
MgCh and 0.5mM DTT. Ubiquitylation reactions were carried out at 37°C for 1h, 
unless stated otherwise. For USP11-mediated deubiquitylation assays, HA-PALB2 
(1-103) was first ubiquitylated using lysine-less ubiquitin with enzyme concentra- 
tions as described earlier in 50,1] reactions in a buffer containing 25 mM HEPES 
pH 8, 150mM NaCl, 10mM MgCh, 0.5mM DTT and 1.5mM ATP for 1.5h at 
37°C. Reactions were stopped by the addition of 1 unit Apyrase (New England 
Biolabs). Reaction products were mixed at a 1:1 ratio with wild-type or catalytically 
inactive (C270S) USP11, or USP2 (provided by F. Sicheri and E. Zeqiraj) using final 
concentrations of 100nM, 500nM and 2,500nM (USP11) and 500 nM (USP2) 
and incubated for 2h at 30°C in a buffer containing 25 mM HEPES pH 8, 150 mM 
NaCl, 2mM DTT, 0.1 mg ml! BSA, 0.03% Brij-35, 5mM MgCh, 0.375mM ATP. 
Pulldown experiments between purified PALB2 and BRCA1. PALB2 in vitro 
ubiquitylation reaction products were diluted in a buffer at final concentration 
of 50 mM Tris-HCl pH 7.5, 150mM NaCl, 5mM MgCh, 0.25mM DTT and 0.1% 
NP-40. Twenty micrograms MBP or MBP-BRCA1-CC was coupled to amylose 
resin (New England Biolabs) in the above buffer supplemented with 0.1% BSA 
before addition of the ubiquitylation products. Pulldown reactions were performed 
at 4°C for 2h, followed by extensive washing. 

Co-immunoprecipitation. Cells were collected by trypsinization, washed once 
with PBS and lysed in 50011 of lysis buffer (20 mM Tris-HCl pH 8.0, 150 mM 
NaCl, 10% glycerol, 2mM EDTA, 1% NP-40, complete protease inhibitor cock- 
tail (Roche), cocktail of phosphatase inhibitors (Sigma) and N-ethylmaleimide to 
inhibit deubiquitylation) on ice. Lysates were centrifuged at 15,000g for 10 min 
at 4°C and protein concentration was evaluated using absorbance at 280 nm. 
Equivalent amounts of proteins (~0.5-1 mg) were incubated with 21g of rab- 
bit anti-PALB2, rabbit anti-USP11 antibody, rabbit anti-GFP antibody or normal 
rabbit IgG for 5h at 4°C. A mix of protein A/protein G-Sepharose beads (Thermo 
Scientific) was added for an additional hour. Beads were collected by centrifugation, 
washed twice with lysis buffer and once with PBS, and eluted by boiling in 
2x Laemmli buffer before analysis by SDS-PAGE and immunoblotting. For mass 
spectrometry analysis of Flag—PALB2, 150 x 10° transiently transfected HEK293T 
cells were lysed in high-salt lysis buffer (50 mM Tris-HCl pH 7.5, 300 mM NaCl, 
1mM EDTA, 1% Triton X-100, 3mM MgCh, 3mM CaCly), supplemented with 
complete protease inhibitor cocktail (Roche), 4mM 1,10-Phenantroline, 50 U 
benzonase and 50 U micrococcal nuclease. Cleared lysates were incubated with 
Flag-M2 agarose (Sigma), followed by extensive washing in lysis buffer and 50 mM 
ammoniumbicarbonate. 

Mass spectrometry. After immunoprecipitation of transiently transfected Flag- 
PALB2 from siCTRL-transfected or USP11 siRNA-depleted 293T cells, cysteine 
residues were reduced and alkylated on beads using 10 mM DTT (30 min at 56°C) 
and 15mM 2-chloroacetamide (1h at room temperature), respectively. Proteins 
were digested using limited trypsin digestion on beads (1 ,.g trypsin; Worthington) 
per sample, 20 min at 37 °C), and dried to completeness. For LC-MS/MS analysis, 


peptides were reconstituted in 5% formic acid and loaded onto a 12cm fused 
silica column with pulled tip packed in-house with 3.5,1m Zorbax C18 (Agilent 
Technologies). Samples were analysed using an Orbitrap Velos (Thermo Scientific) 
coupled to an Eksigent nanoLC ultra (AB SCIEX). Peptides were eluted from the 
column using a 90 min linear gradient from 2% to 35% acetonitrile in 0.1% formic 
acid. Tandem MS spectra were acquired in a data-dependent mode for the top two 
most abundant multiply charged peptides and included targeted scans for five spe- 
cific N-terminal PALB2 tryptic digest peptides (charge state 1+, 2+, 3+), either 
in non-modified form or including a diGly-ubiquitin trypsin digestion remnant. 
Tandem MS spectra were acquired using collision-induced dissociation. Spectra 
were searched against the human Refseq_V53 database using Mascot, allowing 
up to four missed cleavages and including carbamidomethy] (C), deamidation 
(NQ), oxidation (M), GlyGly (K) and LeuArgGlyGly (K) as variable modifications. 
In vitro ubiquitylated HA-PALB2 (1-103) (50,11 total reaction mix) was run 
briefly onto an SDS-PAGE gel, followed by total lane excision, in-gel reduction 
using 10mM DTT (30min at 56°C), alkylation using 50 mM 2-chloroacetamide 
and trypsin digestion for 16h at 37 °C. Digested peptides were mixed with 20 1l of 
a mix of 10 unique heavy isotope-labelled N-terminal PALB2 (AQUA) peptides 
(covering full or partial tryptic digests of regions surrounding Lys 16, 25, 30 or 43, 
either in non-modified or diG-modified form; 80-1,200 fmol il! per peptide, 
based on individual peptide sensitivity testing) before loading 6,11 onto a 12cm 
fused silica column with pulled tip packed in-house with 3.5,1m Zorbax C18. 
Samples were measured on an Orbitrap ELITE (Thermo Scientific) coupled to an 
Eksigent nanoLC ultra (AB SCIEX). Peptides were eluted from the column using a 
180 min linear gradient from 2% to 35% acetonitrile in 0.1% formic acid. Tandem 
MS spectra were acquired in a data-dependent mode for the top two most abundant 
multiply charged ions and included targeted scans for the ten specific N-terminal 
PALB2 tryptic digest peptides (charge states 1+, 2+, 3+), either in light or heavy 
isotope-labelled form. Tandem MS spectra were acquired using collision induced 
dissociation. Spectra were searched against the human Refseq_V53 database using 
Mascot, allowing up to two missed cleavages and including carbamidomethy] (C), 
deamidation (NQ), oxidation (M), GlyGly (K) and LeuArgGlyGly (K) as variable 
modifications, after which spectra were manually validated. 
His-Ub pulldown. 293 FLIP-IN cells stably expressing Hiss—Ub were trans- 
fected with the indicated siRNA and treated with doxycycline (DOX) for 24h to 
induce His—Ub expression. Cells were pre-treated with 10 mM N-ethylmaleimide 
for 30 min and lysed in denaturating lysis buffer (6M guanidinium-HCl, 
0.1M Naz;HPO,/NaH2PO,, 10mM Tris-HCl, 5 mM imidazole, 0.01 M 
6-mercaptoethanol, complete protease inhibitor cocktail). Lysates were sonicated 
on ice twice for 10s with 1 min break and centrifuged at 15,000g for 10 min at 
4°C. The supernatant was incubated with Ni- NTA-agarose beads (Qiagen) for 4h 
at 4°C. Beads were collected by centrifugation, washed once with denaturating 
lysis buffer, once with wash buffer (8 M urea, 0.1 M NazHPO,/NaH2PO,, 10 mM 
Tris-HCl, 5mM imidazole, 0.01 M B-mercaptoethanol, complete protease inhibitor 
cocktail), and twice with wash buffer supplemented with 0.1% Triton X-100, and 
eluted in elution buffer (0.2 M imidazole, 0.15 M Tris-HCl, 30% glycerol, 0.72 M 
8-mercaptoethanol, 5% SDS) before analysis by SDS-PAGE and immunoblotting. 
Homologous-recombination-based repair assays. Parental U2OS cells and U20S 
cells stably expressing wild-type CtIP or CtIP(T847E) mutant were transfected 
with the indicated siRNA and the PALB2-KR construct, synchronized with a single 
thymidine block, treated with doxycycline to induce CtIP expression and subse- 
quently blocked in G1 phase by adding 401M lovastatin. Cells were collected by 
trypsinization, washed once with PBS and electroporated with 2.5 1g of sgRNA 
plasmid and 2.5\1g of donor template using the Nucleofector technology (Lonza; 
protocol X-001). Cells were plated in medium supplemented with 401M lovastatin 
and grown for 24h before flow cytometry analysis. 
PALB2 chemical ubiquitylation. PALB2 (1-103) polypeptides, engineered with 
only one cross-linkable cysteine, were ubiquitylated by cross-linking alkylation, 
as previously described**”, with the following modifications. Purified PALB2 
cysteine mutant (final concentration of 600,1M) was mixed with Hisg-TEV- 
ubiquitin G76C (3501M) in 300 mM Tris pH 8.8, 120mM NaCl and 5% glycerol. 
Tris(2-carboxyethyl)phosphine (TCEP) (Sigma-Aldrich) reducing agent was added 
to a final concentration of 6 mM to the mixture and incubated for 30 min at room 
temperature. The bi-reactive cysteine cross-linker, 1,3-dichloroacetone (Sigma- 
Aldrich), was dissolved in dimethylformamide and added to the protein mix to 
a final concentration of 5.25 mM. The reaction was allowed to proceed on ice for 
1h, before being quenched by the addition of 5mM (-mercaptoethanol. His¢- 
TEV-ubiquitin-conjugated PALB2 was enriched by passing over Ni- NTA-agarose 
beads (Qiagen). 
Statistics and randomization. No statistical methods were used to predetermine 
sample size. The experiments were not randomized. The investigators were not 
blinded to allocation during experiments and outcome assessment. 
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Extended Data Figure 1 | Suppression of PALB2-BRCA2 accumulation 
at DSB sites in G1 53BP1A cells. a, Schematic representation of human 
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control. d, Wild-type and 53BP1A U20S cells either synchronized in 

G1 following a double-thymidine block and release or asynchronously 
dividing (ASN), were irradiated (2 Gy) and processed for y-H2AX, PALB2, 
BRCA2 and BRCA1 immunofluorescence. The micrographs relating to 
BRCA1 and BRCA2 staining in G1 are found in Fig. la. e, Wild-type and 
53BP1A U20S cells synchronized in G1 after release from a double- 
thymidine block were irradiated (20 Gy) and processed for y-H2AX, 
BRCA1 and BRCA2 immunofluorescence. On the left are representative 
micrographs for the G1-arrested cells and the quantitation of the full 
experiment is shown on the right (mean + s.d., N= 3). 
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Extended Data Figure 2 | The BRCA1-PALB2 interaction is cell cycle 
regulated. a, Schematic of the JacO/LacR chromatin-targeting system. 
b, U20S 256 cells were transfected with the indicated mCherry-LacR 
and GFP fusions. GFP fluorescence was measured at the site of the lacO- 
array-localized mCherry focus. Each circle represents one cell analysed 
and the bar is at the median. Cells were also stained with a cyclin A 
antibody to determine cell cycle position (N= 3). IR, ionizing radiation. 
c, Representative micrographs of U2OS 256 cells transfected with the 


indicated mCherry-LacR and GFP fusions; data are quantified in d. 

d, Quantification of U2OS 256 cells transfected with the indicated 
mCherry-LacR and GFP fusions to tether either BRCA1 or PALB2 to the 
lacO array (N= 3). e, Schematic representation of PALB2 architecture 
and its major interacting proteins. f, Quantification of U2OS 256 cells 
transfected with the indicated GFP-PALB2 mutants and mCherry-LacR- 
BRCA1-CC., Cells were also stained with a cyclin A antibody to determine 
cell cycle position (N= 3). 
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Extended Data Figure 6 | Analysis of KEAP1- and USP11-dependent 
modulation of PALB2 and homologous recombination. a, Site-specific 
chemical ubiquitylation of HA-PALB2 (1-103) at residue 20 (PALB2- 
K20-Ub) and 45 (PALB2-Kc45-Ub) was carried out by dichloroacetone 
linking. The resulting ubiquitylated PALB2 polypeptides along with their 
unmodified counterparts were subjected to pulldown with a fusion of 
MBP with the coiled-coil domain of BRCA1 (MBP-BRCA1-CC). I, input; 
PD, pulldown. Asterisk indicates a non-specific band. b, Wild-type and 
KEAP1A 293T cells were treated with cycloheximide (CHX) 

for the indicated time and then processed for NRF2 and KEAP1 
immunoblotting. Actin levels were also determined as a loading control. 
c, Immunoprecipitation (IP) of USP11 from extracts prepared from 


293T cells that were or were not treated with camptothecin (CPT; 200 nM). 


Immunoprecipitation with normal IgG was performed as a control. 
d, U2OS DR-GFP cells were transfected with the indicated siRNAs. 
Twenty-four hours post-transfection, cells were further transfected with 
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the indicated siRNA-resistant USP11 expression vectors (WT, wild type; 
CS, C318S and CA, C318A catalytically dead mutants) or an empty 
vector (EV), with or without an I-Scel expression vector. The percentage 
of GFP-positive cells was determined 48 h post-plasmid transfection 

for each condition and was normalized to the I-Scel plus non-targeting 
(siCTRL) condition (mean + s.d., N= 3). e, Schematic representation of 
human USP11 (top) and KEAP1 (bottom) gene organization and 
targeting sites of sgRNAs (as described in Extended Data Fig. 1a) 

used to generate the USP11A and USP11A/KEAP1A 293T cells. 

The indels introduced by the CRISPR-Cas9 and their respective 
frequencies are indicated. The USP11 knockout was created first and 
subsequently used to make the USP11A/KEAP1A double mutant. 

f, Immunoprecipitation of PALB2 from extracts prepared from 293T 
cells transfected with the indicated siRNA and with or without CPT 
(200 nM) treatment. Immunoprecipitation with normal IgG was 
performed as a control. 
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Extended Data Figure 7 | USP11 antagonizes KEAP1 action on PALB2. 
a, U2OS DR-GFP cells were transfected with the indicated siRNAs or 

left untransfected (— ). Twenty-four hours post-transfection, cells were 
transfected with an I-Scel expression vector (circle). The percentage of 
GFP-positive cells was determined 48 h post-plasmid transfection for each 
condition and was normalized to the I-Scel plus non-targeting (CTRL) 
condition (mean + range, N= 3). b, Parental 293T cells (wild type (WT)) 
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Extended Data Figure 8 | Characterization of USP11 protein stability. 

a, U20S cells synchronized in G1 or S/G2 were treated with cyclohexamide 
(CHX) and processed at the indicated time points to monitor USP11 
stability. b, Immunoprecipitation (IP) of PALB2 from extracts prepared from 
293T cells that were synchronized in G1 or S phase and treated or not with 
ionizing radiation (IR; 20 Gy). c, U2OS cells were irradiated with a dose of 

2 or 20 Gy and processed for USP11 immunoblotting at the indicated times 
post-ionizing radiation. Actin was used as a loading control. d, U2OS cells, 
mock treated or incubated with the ATM inhibitor KU55933 (ATMi), ATR 


inhibitor VE-821 (ATRi) or DNA-PKcs inhibitor NU7441 (DNAPKi), were 
irradiated (20 Gy) and processed for USP11 and actin (loading control) 
immunoblotting. e, Similar experiment to d except that cells were exposed 
to ultraviolet (UV) radiation (50 mJ cm~’). f, U2OS cells, mock treated or 
incubated with the proteasome inhibitor MG132, were irradiated (20 Gy) 
and processed for USP11 and actin (loading control) immunoblotting. 

g, U20S cells, mock-treated or incubated with the cullin inhibitor 
MLN4924, were irradiated (20 Gy) and processed for USP11 and actin 
(loading control) immunoblotting. 
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Extended Data Figure 9 | Reactivation of RAD51 loading and 
unscheduled DNA synthesis in G1. a, 53BP1A U2OS cells were 
transfected with the indicated siRNA, synchronized in G1 or S/G2 by 
release from a double-thymidine block and irradiated (20 Gy) before 
being processed for fluorescence microscopy. DAPI was used to trace the 
nuclear boundary and cyclin A staining was used to determine cell cycle 


position. The percentage of cells with more than five y-H2AX-colocalizing 


PALB2 foci is indicated as the mean +s.d., N=3. Scale bar, 51m. 

b, Representative micrographs of irradiated G1-synchronized wild-type 
(WT) and 53BP1A U2OS cells transfected with the indicated siRNA 
and expressing wild-type CtIP. c, Representative micrographs of 
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(2 Gy) and processed for y-H2AX and BrdU immunofluorescence. The 
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targeted with the CRISPR-mClover system showing the typical 
perinuclear expression pattern of lamin A. f, Micrograph of a U20S 
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Stewards of 
China’s future 


The 2015 Nature Awards for Mentoring in Science recognize 
Chinese scientists who have invested in the next generation. 


BY ED GERSTNER 


uring the past two decades, Chinese 
D science has undergone profound 
growth. China's investment in research 
and development surpassed that of the Euro- 
pean Union in 2013, and it is predicted to over- 
take that of the United States by the end of the 
decade (see Nature http://doi.org/w5r; 2014). 
The proportion of published scientific papers 
that include Chinese co-authors has jumped 
from 2.4% in 1997 to 19% in 2014 — second 
only to the US contribution last year of 25%. 
Those statistics are impressive. But if China 


is to become a true scientific superpower, it 
must be able to produce great scientists who 
are not just knowledgeable but also creative 
and skilled in innovation. And great scientists 
need great mentors to lead the way. 

In recognition of the vision, dedication and 
hard work of those charged with nurturing the 
next generation of Chinese researchers, this 
year’s Nature Awards for Mentoring in Science 
honour five researchers in China. The winners, 
feted in an 8 December ceremony, were chosen 
by panels composed of Chinese scientists and 
Springer Nature editorial representatives (see 
go.nature.com/hdi5k7). Submissions included 


statements from five people who had been 
mentored by the nominee and statements from 
the nominees reflecting their own thoughts 
on mentoring. 

Owing to China’s size, submissions were 
divided into ‘north and ‘south, with awards for 
lifetime and mid-career achievement in each. 
The 50,000-yuan (US$7,815) lifetime-achieve- 
ment award for northern China was shared 
between immunologist Xuetao Cao, who is 
president of the Chinese Academy of Medical 
Sciences, and plant scientist Xingwang Deng, 
dean of the School of Advanced Agricultural 
Sciences at Peking University. The winner for 
southern China is Hongyuan Chen, an electro- 
analytical chemist and director of the Institute 
of Chemical Biology at Nanjing University. 

In the mid-career category, the 50,000-yuan 
awards for northern and southern China 
went, respectively, to Yigong Shi, a structural 
biologist and dean of life sciences at Tsinghua 
University in Beijing, and Hongbing Shu, an 
immunologist at Wuhan University. 


CHALLENGE TO CONVENTION 

Like many Asian nations, China is often seen 
as a place of rigid hierarchies rooted in def- 
erence to power. One trait shared by all the 
winners, and indeed by all those nominated, 
is an understanding that the only authority in 
science is evidence — and that conventional 
wisdom must always be open to question. 

Shi, who was named a chair professor of 
molecular biology at Princeton University 
in New Jersey before he returned to China in 
2008, thinks that most Chinese students are 
too wary of contradicting senior researchers 
and accepted scientific ideas. “I encourage my 
students to think critically and to challenge the 
authorities, including myself, so that they can 
learn that established rules can be broken, and 
with that, new fields of research can be built,” 
he says. 

Cao agrees. “We should inspire students to 
have confidence to challenge the dogma in the 
textbook and address fundamental questions 
in science,” he says. 

The lesson is not lost on the winners’ 
protégés. “The scientific literature is a baffling 
mass of conflicting ideas and results, accepted 
wisdom and false assumptions,” notes Weilin 
Chen, a cancer immunologist at Zhejiang Uni- 
versity and one of Cao’s former PhD students 
at the Second Military Medical University in 
Shanghai. “Professor Cao often said that crea- 
tivity comes from different directions with 
different views,” she says. “And he treats > 
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> everyone, regardless of whether they are 
a PhD student or a visiting scholar, with the 
same high regard.” 

In the past, most Chinese labs were indeed 
quite rigid, with a single senior professor 
directing junior professors, postdocs and 
students along strictly hierarchical lines. 
With the rapid expansion of research insti- 
tutes, however — fuelled by a large influx of 
researchers returning from overseas — the 
structure of many labs has begun to follow a less- 
hierarchical model, with many independent 
principal investigators all pursuing their own 
agendas and research directions. 


ATEACHER’S PHILOSOPHY 

The mentors honoured by Nature have 
recognized the importance of instilling young 
researchers with the self-confidence that they 
need to establish their own intellectual identity 
and to make their own way in the world. “In 
my opinion, simply imparting knowledge is 
not enough,” says Hongyuan Chen. “A mentor 
should teach students the way of thinking. In 
the area of science, I guide my students to think 
ina scientific way, and give them the opportu- 
nity to solve problems independently” 

He thinks that a good mentor must have a 
keen sense of when a student requires guidance 
and when he or she needs freedom. “For stu- 
dents who are just starting out, we need to give 
them more-detailed instructions to let them 
get used to research gradually,” he says. “And 
for those who have a solid knowledge base, 
strong independence and creativity, I let them 
think and practise in their own ways.” 

Jingjuan Xu, a former PhD student of 
Hongyuan Chen’s and now an analytical 
chemist at Nanjing University, says that Chen 
provided an open environment that fostered 
imagination and creativity. “He encouraged us 
to read philosophy and literature, and think 
from different aspects,’ recalls the chemist. 
“He said that every student is an independ- 
ent, thinking being; 


a good mentor “Weshould 

should nurturethem inspire 

to become ‘horses’ students to 

rather than ‘sheep.” have confidence 
Good mentorsalso fo challenge the 

recognize that it is not dogmain 


enough to produce 
successful scientists 
— it is just as important to teach others how to 
be effective, inspiring leaders themselves. Lei Li, 
a postdoc of Deng’s at Yale University in New 
Haven, Connecticut, and nowa professor in 
the School of Life Sciences at Peking Univer- 
sity, recounts her own training in Deng’s lab. 
“As I became more senior in the lab, Professor 
Deng started to ask me to help others in their 
lab techniques and in reading their manuscripts, 
which I soon realized was part of a system,” she 
says. “When he discovered performance issues, 
he never just criticized; he took time to find the 
root of the problem. And in several instances, 


the textbook.” 


he delegated me to do the pep talk” 

The testimonials for the award winners all 
strongly reflect the scientists’ unwavering dedi- 
cation to the success of their protégés. But one 
story in particular stands out. 

In 2005, immunologist Bo Zhong, now at 
Wuhan University, applied to do a PhD in 
Hongbing Shu’s lab after graduating with a 
major in English. “I was determined to study 
biology after graduation because I was inter- 
ested in nature,” says Zhong. At Wuhan, “Dr 
Shu had recently been appointed as dean of life 
sciences, and his group [at the National Jewish 
Medical and Research Center in Denver, Colo- 
rado] had just published a milestone discovery 
in Molecular Cell. Every student with ambition 
wanted to join his lab — and so did I’. 


NEVER GIVE UP 

Zhong knew that it wouldn't be easy. “I had to 
admit that my background was much weaker 
than those who majored in biology,’ he says. “I 
downloaded all his publications but found that 
I could hardly understand them. I knocked on 
the door to his office, and asked many naive 
questions. He patiently explained the details, 
recommended more publications to me and 
encouraged me to ask him if I had any diffi- 
culty in understanding the studies. Following 
his instructions, I read more papers, and wrote 
a five-page summary about pattern recogni- 
tion and signalling, and asked whether I could 
join his lab. To my surprise, he agreed” 

Shu admits that he was unsure about 
Zhong’s potential at first, but after seeing his 
determination, Shu felt that Zhong deserved a 
chance to show what he could do. He doesn't 
regret the decision. “After I was convinced of 
his ambition and drive for a scientific career, 
I took him without hesitation. He has so far 
proved himself as one of the most successful 
students trained in my lab? After taking him 
on, Shu asked Zhong to turn the summary that 
he had written into a full review paper, which 
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Immunologist Xuetao Cao (left) and plant scientist Xingwang Deng (right) both won mentoring awards. 


became the first publication to come out of the 
newly formed lab. 

Shu thinks that patience and perseverance 
are among the most important traits of good 
mentorship, something he learnt from one of 
his own mentors: his PhD supervisor, Harish 
Joshi, a cell biologist at Emory University in 
Atlanta, Georgia. “I have always remembered 
what he told me when I was in his lab. ‘Do not 
fire them; fire them up!?” Shu recalls. “In my 
17- years’ mentoring life, I have never given up 
on any one of my students.” 

A well-known Chinese saying goes, “If 
someone is your teacher for just one day, you 
should regard that person as your parent for 
the rest of your life”” The influence that great 
mentors have does indeed live long — and not 
just in their students, but in their students’ 
students. “When I started my own lab in 
2012, I often asked myself what Yigong would 
do,” says Liang Feng, a structural biologist at 
Stanford University in California and a former 
PhD student of Shi’s. “I kept all e-mail com- 
munications Yigong sent to me or to the lab, 
and often went back to read them. They are 
like a ‘how-to’ guide for running a lab. For me 
and many others, Yigong was not only a great 
mentor and a role model, but also a relentless 
supporter and a lifelong friend.” 

The word used to describe the most revered 
teachers, shifu — a portmanteau of the words 
for teacher, laoshi, and father, fugin — echoes 
the deep connection that forms between 
exceptional mentors and their protégés. None 
of the scientists who nominated their mentors 
for an award takes this filial bond for granted. 
In the words of Hongyuan Chen’s protégé Jing- 
juan Xu, “I think that ‘father’ is really too high 
a standard to expect from a teacher. But we are 
the lucky children, because Professor Chen 
treated us like his own kids.” m 


Ed Gerstner is executive editor for Nature 
journals in Greater China, based in Shanghai. 
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Ua SCIENCE FICTION 


BY JOHN GILBEY 


/ ne guard at the gate of the Citadel 
nodded acknowledgement as I 
walked up the worn stone steps 

towards her, my boots crack- 

ling in the frost. “A cold night, 
pilgrim,” she intoned, her 

breath visible on the night air. I 

smiled to myself, realizing that 

she had mistaken my hastily 
snatched robes for the garb 

of a common supplicant. The 

temptation was too great and 

I twitched aside my cloak to 

reveal my cryptogram of office. 

The reaction of horror was 
immediate, thus I decided to 
spare her rank — and so, possibly, 
her life. “Your pardon, Eminence, I 
took you for another ...” Her fear was 
real and I felt a tang of guilt like the first 
nip of frostbite. I made the sign of peace 
and she relaxed, fractionally. 

“We are all pilgrims, Guardian... 
Worker or healer, hunter or feeder, we 
must all play out our duty to the faith 
with honour and respect according to our 
trade.” Aware that I was almost quoting 
from the creed she cast her eyes downward 
and muttered a blessing under her breath. I 
paused while she recovered herself, gauging 
her adherence to protocol as she scanned 
my cypher and released the portal. “Pass, 
my Lord...” 

The great hall beyond the robing room 
was quiet, aside from the distant sound of 
the wind moving around the tower above. 
The hall had set itself for night, so only a 
low glow followed my progress towards the 
display. My Lady stood, apparently deep in 
reverie, before the map that once purported 
to describe our world — but which we now 
know is wildly untruthful. She looked up 
at my approach and pointed to a small 
red stain on the curved stone of the panel. 
“Another is gone — that makes three since 
midsummer...” 

News indeed, and of an import that 
explained the urgent summons. When we 
first inherited this duty, only a handful of 
the myriad marks on the plot were red — 
the great majority being green, the colour 
of plants and life. Now, 50 summers later, 
close to half have adopted the deep crimson 
of mourning. Lore tells us that these marks 
speak of the health enjoyed by those sentinel 
obelisks of impervious metal that gird our 


CITADEL 


How to survive the solstice. 


lands, that somehow talk to the citadel and 
help to build the intricate coloured patterns 
that scatter the map. 

These red marks prey on my mind, seem- 
ing to signify the loss of so much more. So 
many deaths! Folk whose faces glide before 
me every time I visit this chamber, yet I must 
remain resolute — as so many others have 
done before me. If only my parents had not 
been among those who perished in that sud- 
den, crippling spring ice-fall when I was still 
so small a child. The knowledge and learn- 
ing that died with them cannot be replaced, 
yet the common people still look to the Lady 
and I for counsel and guidance. 

As so many times before, I walked for- 
ward and laid my hands flat on the stone 
panels as if to commune with the hidden 
forces within. With a finger tip I gently 
traced the outlines of the glyphs that I feel 
certain hold the key to the secret knowledge 
— then hung my head in shame and frustra- 
tion. I felt a warm hand slide into mine and 
squeeze it lightly. 

“You are troubled, husband.” The smile 
of the Lady should be able to melt the ice 

fields that surround 


> NATURE.COM us, but that would 
Follow Futures: mean sharing it with 
 @NatureFutures others. I turned to 


Ei go.nature.com/mtoodm her, and we embraced 
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for the first time in many days. I held her 
long and close. “Our world may be dying, 
yet I haven't the wisdom to read the signs. 
Lines traverse the plot, growing longer 
with every passing solstice — yet I've 
not the skill to know whether they 

should alarm or reassure by their 
rise and fall” 

My Lady took my arm and 
turned me to face the empty 
hall. “Have you considered, 
husband, why the ancients 
who built this place made the 
hall so large? Surely it is not 
meant just for the two of us — 

who alone may enter it today? 

Perhaps one of the pilgrims 

who freezes in the courtyard 

beyond hasa shred of wisdom to 
add to our own? Indeed, mayhap 
many of them have much to give? 

Each traveller who is sent here is the 
most skilled of each hearth and trade — 
that is surely a sign to us...” 

The shock in my face drove her back a 
step. “For many generations it has been 
thus,” I stormed. “They stand below and we 
assure them that all is well, that we maintain 
control. We alone must hold this place — 
else who in the land will know their point 
and worth?” 

Her eyes, wide and dark in the poor light, 
clouded with anger and regret. For the sec- 
ond time this night I had demeaned and cast 
down one of my folk whose only failing was 
to share their humanity. Turning abruptly 
from her, I faced the centre of the display 
and made great play of deep thought. It was 
obvious that she was right, there could be 
only one conclusion. 

“Very well. When the winter solstice cele- 
bration comes we will open the hall to all 
and demand — yes, demand — that they 
share their skills so that we may deepen our 
understanding of this place.” The ghost ofa 
smile flitted across my Lady’s lips, before she 
lowered her gaze. 

At the base of the plot, the image writ- 
ten thus “Survival Likelihood 3” in the rock 
became faint for a moment, and the last 
glyph in the sequence changed to “4”. One 
day, our children may understand whether 
that was a good thing or not. = 


John Gilbey writes from the academic 
seclusion of the University of Rural England, 
where they worry about things like this. He 
tweets as @John_Gilbey. 
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cancer. Globally, it is the second most common cancer in 
men, and in some places it takes the top spot (page S118). 

As the prime reproductive years fade, the gland typically 
begins to misbehave. The first sign that men often experience is 
inflammation — a condition that is sometimes, but not always, 
a precursor to cancer. The interplay between inflammation and 
cancer remains an area of intense research (page S130). 

Prostate-cancer screening has provoked contentious debate 
(page S120). Blood tests for prostate-specific antigen (PSA) 
have led to the discovery of cancers at earlier and more treatable 
stages. But they have also revealed many tumours that could 
safely be left untreated. Researchers are looking beyond PSA to 
other biomarkers that could be used to tell more reliably which 
cancers need treatment (page S124). Often, the best therapeutic 
option is just to be vigilant — ‘active surveillance’ is now the 
norm (page S126). When a trip to the operating theatre is 
unavoidable, robotics is making prostate surgery less likely to 
cause adverse effects (page S132). 

Hopes for a vaccine have dimmed (page $134). The only 
approved immunotherapy for prostate cancer — sipuleucel-T 
— adds mere months to survival time and is expensive. 
Researchers are focusing on combinations of therapies, such 
as a checkpoint therapy administered together with a drug 
that targets tumour hypoxia. Because prostate tumours are 
most dangerous once they escape the gland itself, intense 
efforts are targeting metastatic cancers that have become 
resistant to standard treatments (page S128). 

We are pleased to acknowledge support from Ferring 
Pharmaceuticals and a grant from Astellas Pharma Global 
Development, Inc. and Medivation, Inc. in producing this 
Outlook. As always, Nature retains sole responsibility for all 
editorial content. 
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PROSTATE CANCER 
SMALL ORGAN, BIG REACH 


Prostate cancer is one of the most common cancers in men — most will develop the disease 
if they live long enough. But it is not always deadly, and the number of cases often depends 
on how hard doctors look for it. By Richard Hodson, infographic by Mohamed Ashour. 


GLOBAL INFLUENCE 


The rate of prostate-cancer diagnosis varies more than 25-fold around the world. The 
incidence rate in a country is influenced by trends in diagnostic testing, which vary 
from place to place, as well as by the age and ethnic mix of a population. “@) 
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ETHNICITY EFFECTS 


On the Caribbean island of Martinique, men have 


LOOKING FOR TROUBLE 


The rate of prostate-cancer diagnosis in the United 


HOW DEADLY? 


Prostate cancer is the second most common 


a 26% chance of being diagnosed with prostate 
cancer by age 74 — the highest in the world. But 
in Bhutan, the risk is just 0.14%. Ethnicity may 
play a part. English black men have much higher 
rates of the disease than Asian men?. 


Cases per 100,000 English men 


Asian White Black 


States spiked after the prostate-specific antigen (PSA) 
test was introduced in 1986 (ref. 3). Testing men 
without symptoms is no longer recommended. In 
places where the test is used less, such as the United 
Kingdom, rates have increased only gradually’. 
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cancer in men worldwide, just behind lung 
cancer. But for every 30 lives lost to lung cancer, 
just 8 men will succumb to prostate cancer’. 
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MAN ON THE INSIDE 


The prostate gland is a male organ involved in sexual function. Its size ranges from that of a 
walnut to that of a small apple, and can become enlarged as a result of cancer, inflammation Tust H 
or benign prostatic hyperplasia. 


SURVIVAL STORY 


When prostate cancer is 

diagnosed early, before it 

most, of prostate has spread, chances of 
PAREAE ln THE survival are much higher®. 


United States are 100 .:. 
diagnosed after 
they have spread 
to distant tissues. 


, at 


80 


Sitting at the base of the 
bladder, the prostate 

surrounds the urethra 
that carries urine and 
BEyaraae semen from the body. 


— —_—— 


Many prostate 
tumours begin at 
the edges of the 
ag / / gland. There can 
é / ; be more than 
one tumour 
within a gland. 


40... BE 


B. WAINWRIGHT, CUSTOM MEDICAL STOCK PHOTO/SPL 
5 year survival rates (%) 


Local Distant 


Men often first become 
aware of a prostate tumour 
when it puts pressure on 
the urethra or bladder. 


In advanced prostate 
cancer, parts of the 
tumour metastasize 
through the blood 
and lymph to distant 
parts of the body, 
including lymph 
nodes and bones. 


A CENTURY OF TREATMENT 

For localized prostate cancer, the most common 

intervention is surgical removal of the prostate — radical @ Chemotherapy @ Hormone therapy The only approved immunotherapy for prostate 
prostatectomy. If cancer has spread beyond the prostate @ Immunotherapy @ Radiotherapy cancer, sipuleucel-T, is costly and extends life by 

it cannot be cured. Suppressing male hormones slows only a few months. Successors are in development, 
growth, but tumours can become resistant. Since 2004, and combining them with other therapies may 
therapies to target resistant metastatic cancer have prove fruitful (see S134). 


gained US Food and Drug Administration approval. Docetaxel is a common drug in the 


treatment of many cancers. Before May 2013 


April 1904 May 2004 its introduction in 2004, there was April 2010 April 2011 Radium-223 


no real treatment for advanced, : i : } 
Surgery Docetaxel castration-resistant prostate cancer. Sipuleucel-T Abiraterone dichloride 


The first radical prostatectomy Testosterone causes prostate cancer to grow; 
is performed by Hugh abiraterone and enzalutamide supress June 2010 August 2012 
Hampton Young at Johns testosterone. However, nearly everybody Cabazitaxel Enzalutamide 
Hopkins University, Baltimore. treated with the drugs develops resistance to 
them over time (see S128). 


Sources: 1. International Agency for Research on Cancer; 2. National Cancer Intelligence Network; 3. National Cancer Institute's Surveillance, Epidemiology, and End Results Program; 4. Cancer Research UK; 
5. Siegel, R. L. et al. CA Cancer J. Clin. 65, 5-29 (2015). 
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Asimple blood test is used to measure prostate-specific antigen, or PSA, but researchers continue to debate how best to use the test to save lives. 


Diagnostic dilemma 


The standard blood test for prostate cancer led to a spike in diagnoses of the disease. But 
the technique’s results are often misleading — and conflicting studies have not helped to 


forge a consensus. 


BY EMILY SOHN 


test that could detect prostate cancer early, 
before it could become life threatening. So 
appealing, in fact, that enthusiasm for the pros- 
tate-specific antigen (PSA) test caught on long 
before there was strong evidence to support it. 
PSA is a protein that is produced by the 
prostate gland and is usually found in the 
blood at higher levels when a prostate tumour 
is present. The US Food and Drug Administra- 
tion initially approved the PSA test for cancer 
monitoring in 1986, and by 1992 the US inci- 
dence rate for prostate cancer had more than 
doubled from 119 to 237 cases per 100,000 
people. From 1992 to 2012, deaths from pros- 
tate cancer halved, from about 39 cases per 
100,000 people to 20. “When you look at the 
curves, there’s nothing else like it with other 
cancers,’ says Laurence Klotz, a urologic 
oncologist at the Sunnybrook Research Insti- 
tute at the University of Toronto in Canada. 
But scepticism also emerged early and 
deepened over time, especially as two closely 
watched trials produced drastically different 
results — one showing a substantial benefit of 


I: was an appealing idea: a simple blood 


screening and the other showing no benefit at 
all. In the meantime, studies have shown that 
whereas many men have had their lives saved 
by early detection with the test, many others 
have been diagnosed and treated for cancers 
that in all likelihood would never have caused 
them harm. 

“The PSA was a genie that got out of the 
bottle well before randomized trials were 
initiated,” says Michael Barry, a primary care 
doctor at Massachusetts General Hospital in 
Boston and president of the Informed Medi- 
cal Decisions Foundation, a Boston-based 
organization that advocates for evidence-based 
shared decision-making between doctors and 
patients. Even when trial results became avail- 
able, he adds, they failed to resolve the question 
of whether the test was worthwhile. 

Hundreds of studies have now analysed the 
consequences of screening. The results have 
led to important developments in testing pro- 
tocols, treatment decisions and public trust in 
the PSA test. And taken together, these find- 
ings are starting to reveal how best to use the 
test to help more people and harm fewer of 
them. Still, researchers and clinicians continue 
to debate everything from what level of PSA 
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should be considered alarming to which men 
should have the test in the first place. Some 
30 years after the PSA test was introduced, the 
question it raised still lacks a definitive answer 
— what is the best way to protect men from 
prostate cancer without treating those who are 
better off left alone? 

“You can support any argument you want 
depending on which data you quote,” says 
Klotz. “We are not nearing consensus.” 


THE GOOD, THE BAD AND THE OPINIONS 
PSA emerges from the prostate and circu- 
lates through the blood at levels that become 
increased for various reasons. By the mid- 
1980s, it was clear that prostate cancer was 
one of those reasons, and doctors began 
using the test to track progression of the dis- 
ease. One of the first studies to suggest that 
the PSA test might also revolutionize the 
ability to screen for cancer emerged in 1991 
(ref. 1), when researchers found that the test 
detected many more cancers than did rectal 
examination, which at the time was the best 
screening method available. 

The study included more than 1,600 
men who received the PSA test, which was 


GIPHOTOSTOCK/CULTURA/SPL 


US PREVENTIVE SERVICES TASK FORCE 


followed up with rectal exams and ultra- 
sound scans if PSA levels were deemed high, 
as well as 300 men who underwent biopsies 
after being flagged during the course of clini- 
cal care. Of the 37 men in the study group 
who were diagnosed with prostate cancer, 12 
of them would have been missed if they had 
received only rectal exams. 

At the time, nearly 20% of men diagnosed 
with the disease had an advanced form that had 
already spread outside the prostate, so doctors 
were eager for a way to pinpoint the disease at 
an earlier, more treatable stage. The new find- 
ings offered hope that the PSA test might be 
the answer. When the study came out, media 
coverage was enthusiastic, and lead author Wil- 
liam Catalona appeared on the television talk 
show Good Morning America. “T think that kind 
of kicked off the PSA era,’ says Catalona, who 
is director of the clinical prostate-cancer pro- 
gramme at Northwestern University Feinberg 
School of Medicine in Chicago, Illinois. 

Fritz Schréder, a professor of urology at 
Erasmus University Medical Center in Rotter- 
dam, the Netherlands, remembers hearing of 
the study’s results with excitement and meeting 
with a colleague in Belgium to discuss them. 
Recognizing a clear need for a randomized 
trial to assess the PSA test’s ability to save lives, 
they put together the European Randomized 

Study of Screening 


“The PSA was for Prostate Cancer 
a genie that got (ERSPC). This trial 
out of the bottle eventually grew to 
well before include 240,000 men 
randomized from eight countries, 
trials were who were randomly 


initiated.” assigned into control 
and test groups, with 
the latter receiving PSA tests every one to four 
years. The first results from the ERSPC, which 
were published in 2009 and included nine 
years of follow-up data, reported a 20% drop 
in deaths from prostate cancer as a result of 
early detection with the PSA test”. In 2014, that 
figure grew to 27% after analyses were adjusted 
to include only men who had actually com- 
plied with the screening regimen to which they 
were assigned’. 

Other lines of evidence have emerged to 
support screening. Since the beginning of 
widespread PSA testing, Catalona says, there 
has been an 80% drop in the percentage of 
patients in the United States whose cancers are 
metastatic at the time of diagnosis — a major 
factor in the declining US death rate from the 
disease. Trends are similar in other countries 
that have adopted screening, Catalona adds, 
with a link between when screening started 
and when death rates began to drop. Denmark, 
for example, started encouraging screening 
later than did other Nordic countries, and 
Danish prostate-cancer mortality levelled off 
later than in those neighbouring countries. 
Other researchers disagree about how many 
of those lives were saved as a result of the PSA 


TO SCREEN OR NOT TO SCREEN 


Screening for prostate cancer with the prostate-spe- 
cific antigen test produces an array of outcomes. If 
1,000 men between the ages of 55 and 69 are 
screened every 1 to 4 years for a decade then ... 


.. between 100 and 120 
men will have false— 
positive results that cause 
anxiety and may lead to 
further investigations with 
potential side effects. 


Honenenete 
Honenenete 
Areneneete 
Honene net 
HonenO HATE 


—»- —»- —2- —2- = 
=—=»- —»- —2- —2- = 
—=»- —»- —»- —- 
=—=- —»- —2- = 
= —»- —2- —2- = 
=» —»- —»- —>- —- 
—»- —»- —»- —- —- 
—»- —2- —2- —2- —- 
== —2- —»- —- -—D- 
=> —»>- —»- — > —D- 
—»- —»- —»- = 
—»>- —»- —2- = 
= —»- —2- = 
=» —»- —»- — 
=» —»- —»- —- 


.. between 
oe 2 2 2 es 2 so 6 4 and 5 will 
110 men will receive die from the 
an accurate diagnosis disease 
of prostate cancer, despite 
and of these men ... screening ... 


—- —»- —»- —- 
=—»- —»- —2- —»- 
=—- —»- —=2- => 
==> = —- —D- 
=» —»- —»- —D- 
=—2- —2- —2- —»- 


... and, at best, just 


will be 
avoided. 


At least 50 of the 1,000 men screened 
will have complications that result from 
subsequent treatment, including erectile 
dysfunction, incontinence and, in rarer 
cases, serious cardiovascular events. 
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test because treatment has also improved dur- 
ing the same period. 

Still, PSA-test advocates also point to an 
intangible benefit: peace of mind for men 
whose result indicates a low risk of prostate 
cancer. In two studies — one of men in their 
40s and the other of men aged 60 — a PSA 
value of below 1 has been linked to a very low 
likelihood of developing aggressive cancer for 
many years afterwards. “There is no other bio- 
marker,’ Klotz says, “that gives you a 20-year 
predictive value of getting a common cancer.” 

As encouraging as these findings may 
sound, consensus on them has been madden- 
ingly elusive. When the ERSPC published its 
first results, the same journal issue published 
conflicting findings from another large trial, 
which included more than 76,000 US men who 
were randomly assigned to two groups: one 
that received a PSA test and rectal exam, and 
one that did not. This study, the US Prostate, 
Lung, Colorectal, and Ovarian (PLCO) Cancer 
Screening Trial, showed no reduction in deaths 
from PSA testing after 11 years of follow-up*. 

The two contradictory trials remain at 
the centre of debates about the benefits of 
PSA screening. The PLCO trial in particular 
has been criticized for widespread failure of 
subjects to comply with experimental condi- 
tions. Many men in the control group had 
a PSA test, whereas many assigned to the 
screening group went unscreened. Without 
adjustments to account for compliance rates, 
critics argue that the two groups were essen- 
tially the same. Still, the results continue to 
be included in major reviews, including the 
most recent analysis by the US Preventive 
Services Task Force (USPSTF), an inde- 
pendent panel of experts based in Rockville, 
Maryland, that makes evidence-based rec- 
ommendations about preventive services. 

The European trial has not escaped criti- 
cism, however. Even a 20% reduction in risk 
of death would add up to just one fewer case 
of metastatic prostate cancer per 1,000 men 
screened over 13 years. That short time hori- 
zon is problematic, says Barry, who argues 
that 13 years is not sufficient to assess the 
long-term effects of a test on a disease that 
often occurs later in life. Moreover, during 
the same time period that the PSA tests were 
being introduced, drastically improved treat- 
ments were entering the clinic — a devel- 
opment that may well account for better 
outcomes. “People accept that there is some 
benefit due to screening,’ Barry says, “but 
how much is a subject of debate.” 


BETTER NOT TO KNOW? 

Whatever their magnitude, the benefits 
of PSA screening come with some seri- 
ous downsides. One complicating factor is 
that PSA levels can be increased for reasons 
that have nothing to do with cancer, includ- 
ing urinary-tract infections, inflammation 
and enlargement of the prostate, a benign 
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condition that becomes increasingly common 
as men age. As a result, many men who are 
flagged for follow-up by their high PSA levels 
do not have cancer at all. It is also common for 
men to harbour non-aggressive, slow-grow- 
ing tumours for many years, and to eventually 
die from some other cause — which means 
that even men who do have cancer often 
do not need to know about it. The authors 
of one autopsy study found prostate cancers 
in 64% of men in their 60s who had died of 
something other than prostate cancer®. In the 
United States, a man has about a 14% risk of 
developing prostate cancer in his lifetime, 
but less than a 3% risk of dying from it. As a 
result, attempts to catch aggressive prostate 
cancers early have ensnared many men who 
never should have become cancer patients in 
the first place. 

The discovery of an increased PSA level 
presents an array of potential risks. Biopsies 
can cause pain, fever, blood in the urine and 
infections, which are increasingly resistant to 
antibiotics. And because biopsies sample only 
a fraction of the prostrate, they are not regarded 
as conclusive — 


uncertainty that often “There isno 
means further tests Other biomarker, 
and biopsies even that gives you 
after a negative result. @ 20-year 
“Patientsarehappyto predictive 

get ablood test; says value of getting 
Peter Albertsen, a acommon 
urologist at the Uni- cancer.” 

versity of Connecticut 


Health Center in Farmington, “but that starts 
the ball rolling downhill, and it can lead to all 
sorts of consequences.” 

Wherever screening is widespread — includ- 
ing the United States, Australia and parts of 
Europe — unnecessary treatments are ram- 
pant, says Barry. The US National Cancer 
Institute estimates that for every 1,000 men 
screened regularly with the PSA test over the 
course of a decade, as many as 120 will get a 
false-positive result that may lead to a biopsy. 
Another 110 will get a cancer diagnosis (see 
“To screen or not to screer’). And nearly half 
of those 110 will have complications from 
treatment, including incontinence and sexual 
dysfunction. “Overdiagnosis,’ Schréder says, 
“occurs at a rate that we find very disturbing” 

Cancer diagnoses carry an understudied psy- 
chological burden, Barry adds, and anxiety can 
linger even after reassuring biopsy results. There 
are hefty financial costs, too. In a 2011 analysis 
of data from the ERSPC that was extrapolated 
to the United States, researchers estimated that 
preventing one death from prostate cancer costs 
more than US$5 million in screening, biopsies 
and treatments’. “If we treat patients for cancer 
and they die the same day they were destined 
to die from a heart attack,” says co-author Alex 
Shteynshlyuger, a urologist in private practice 
in New York, “what good have we achieved?” 

Based on the seemingly high rate of potential 


harm, the USPSTF updated its recommenda- 
tions in 2012 to advise against routine PSA 
screening for all men. The United Kingdom 
has also decided against a national prostate- 
cancer screening programme owing to a lack 
of convincing evidence to support the PSA test. 
Still, other doctors and organizations continue 
to recommend screening, with variations in 
what age it should begin, how frequently tests 
should occur and what PSA levels should be 
considered concerning. The result is confusion 
for men who want to make informed decisions 
about their health. 


BLAMING THE MESSENGER 

As scientists grapple with the data, there is 
another ongoing problem: the data keep 
changing because doctors are getting bet- 
ter at selecting the most eligible patients for 
screening and treatment. People are also 
making different choices about screening, 
with drops in both the number of US men 
having the PSA test and the number of 
prostate-cancer diagnoses, according to two 
studies published in November”. Still, disa- 
greement persists about where the balance 
lies, and those arguments continue to rely on 
trial data that are becoming obsolete. “There 
are very few things we are doing today that 
we were doing the same way when the stud- 
ies began,” Shteynshlyuger says. “The tectonic 
plates keep moving under our feet.” 

Part of the shift is a result of advances in 
screening, which are helping doctors to zero 
in on aggressive cancers that need the most 
attention. Among the new strategies is a 
tool called the prostate health index (PHI), 
which measures three types of PSA. Accord- 
ing to some research, the PHI is three times 
more specific than the standard PSA test, an 
improvement that reduces the number of 
unnecessary biopsies. Doctors around the 
world also now factor in a tumour’s Gleason 
score, which assesses aggressiveness based 
on the way that cancer cells look under a 
microscope. And researchers are continually 
re-examining the level at which the quantity 
of PSA in the blood should be considered 
abnormal. Some evidence, for example, sup- 
ports the idea that the threshold for concern 
should be raised from its present value of 
3-4 nanograms per millilitre to 10 nano- 
grams per millilitre. Beyond PSA, scientists 
are also using magnetic resonance imaging 
to guide biopsies making false negatives less 
likely, as well as genetic tissue tests to screen 
for biomarkers that signal a cancer’s degree 
of aggressiveness (see page S124). These tests 
can be expensive, and health-insurance com- 
panies in the United States do not necessarily 
cover them. Many are so new, Barry adds, that 
there are insufficient data on outcomes. Rush- 
ing to accept newer tests before sound trial 
evidence arrives, in other words, might bring 
a repeat of the troubled PSA era all over again. 

But the real crux of the screening debate is 
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what happens when results come in — and that 
is where big changes are happening. Within 
the past decade, for example, there has been a 
major spike in the number of men with low-risk 
cancers who choose to forgo treatment, instead 
taking a wait-and-see approach known as active 
surveillance, which, depending on the situation, 
could mean periodic screening or careful obser- 
vations of symptoms (see page $126). 

In 2006, 90% of US men diagnosed with 
prostate cancer were treated for it, says Stacy 
Loeb, a urologist at New York University 
School of Medicine. Today, only 50-60% opt 
for treatment. Sweden has been particularly 
quick to adopt the strategy: 91% of men with 
very low-risk and 74% of men with low-risk 
prostate cancer in the country now opt for 
active surveillance. As fewer men are treated, 
one hope is that the benefits of PSA testing will 
begin to outweigh the harms. “There is a lot of 
controversy about screening because it used to 
be done in such a very rudimentary fashion,” 
Loeb says. “We have come to recognize that it’s 
not so black and white” 

Given the uncertainties, many experts 
now recommend an approach that considers 
each patient's situation individually. Statisti- 
cal tools are helping with the process; at the 
University of Texas in San Antonio, for exam- 
ple, researchers used data from thousands of 
biopsies to create an online calculator that 
incorporates age, race, family history, PSA 
score and other factors into a recommenda- 
tion that doctors and patients can consider 
together. This kind of shared-decision- 
making strategy is currently recommended 
by organizations such as the American Uro- 
logical Association. 

Forthcoming data may soon make screen- 
ing decisions even more informed. In Janu- 
ary 2016, researchers are expected to release 
10-year follow-up results from the Prostate 
Testing for Cancer and Treatment (PROTECT) 
trial, which includes more than 1,600 British 
men who were diagnosed with localized pros- 
trate cancer using PSA tests and then randomly 
assigned to one of three treatment options, 
including active surveillance. But based on the 
history of PSA testing, it is hard to imagine that 
any fresh results will settle disagreement about 
screening once and for all. = 


Emily Sohn is a freelance journalist in 
Minneapolis, Minnesota. 
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PERSPECTIVE 


says Monique Roobol. 


cer in men (excluding non-melanoma skin cancer). In the United 
States alone, there will be 220,800 new cases and about 27,540 
deaths from the disease in 2015 (ref. 1). 

Not all prostate cancers are the same. Some cases are very aggres- 
sive, causing painful bone metastases and turning deadly, whereas 
others can stay dormant throughout the patient’s life. This means that 
prostate cancer is only the second biggest cause of cancer deaths in US 
men, behind less-common lung cancer. So although a man’s lifetime 
risk of being diagnosed with prostate cancer is 1 in 7, the risk of dying 
from prostate cancer is only 1 in 38. 

A lot of prostate cancers are, therefore, overdiagnosed: they are 
unlikely to ever cause harm, let alone death. This overdiagnosis 
is initiated by the liberal application of a cheap, easy to apply and 
sensitive blood test: the prostate-specific antigen (PSA) test. And, 
crucially, that this test is given to too many men 
or too often, against best-practice guidelines. 

To understand the current situation, it is 
helpful to outline the history of the test. From 
the mid-1980s until the early 1990s, PSA was 
officially used only to monitor the course of 
prostate cancer in men who were already diag- 
nosed. At the time, prostate cancer was a life- 
threatening disease: one in every two or three 
patients died. In 1994, a team from Washington 
University School of Medicine in St Louis, Mis- 
souri, showed that adding a PSA test to a digital 
rectal examination increased the rate of early 
detection of the cancer — when the disease is 
confined to the prostate — by 78% (ref. 2). The 
same year, the US Food and Drug Adminis- 
tration approved this test combination to help 
detect cancer, and it was rapidly adopted. Phy- 
sicians were able to actively seek out the disease, and it soon became 
clear that prostate cancer was actually very common. 

These findings raised two questions. First, is it possible to reduce 
prostate-cancer mortality if the PSA test is introduced as a screen- 
ing tool? And second, is it possible to reduce the side-effects of PSA 
screening, including overdiagnosis? To address these questions, two 
randomized trials — one in the United States’ and one in Europe’ — 
were initiated. Both trials have reported on the effect of PSA testing 
on prostate-cancer mortality several times over the years, and have 
always contradicted each other (although it is generally accepted that 
within the US trial contamination substantially limited researchers’ 
ability to identify a clinically significant screening benefit). This lack 
of consensus and the considerable risk of overdiagnosis associated 
with PSA-based screening are the main reasons that screening for 
prostate cancer is still highly controversial, and why there are so few 
population-based government-initiated screening programmes. 

What has become much clearer, however, is how to use the PSA test in 
such a way that the side effects are reduced. There are numerous papers 
describing how and when to use the PSA test. One of these outlined 
five golden rules’. PSA testing should not be carried out without pretest 
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THE TIME HAS 
COME TO ACTUALLY 


IMPLEMENT 


THE EVIDENCE-BASED 
GUIDELINES INTO 


CLINICAL 
PRACTICE. 


Enforce the clinical guidelines 


Prostate-specific antigen is not a bad test, it is just improperly applied, 


counselling and explicit consent. Do not test in circumstances where 
screening clearly has no benefit — if a man has an estimated life expec- 
tancy of less than 10-15 years, or ifhe is over 60 years old and has a PSA- 
level of less than 1 nanogram per millilitre. The decision to perform a 
prostate biopsy — the next stage in a cancer diagnosis — should be taken 
based on multiple parameters and not solely on the PSA level. And a 
diagnosis of prostate cancer should not automatically lead to treatment. 

Most of these recommendations have been included in the various 
national or regional guidelines on prostate-cancer screening, but are 
not being followed. American Urological Association (AUA) guide- 
lines published in August stated that “screening patterns have been 
inappropriate and require modification”®. The same holds for Europe, 
where modern screening practices go against the European Associa- 
tion of Urology (EAU) guidelines. Notably, the highest screening rates 
are seen in men aged 75 or older, and men with a PSA of less than 
1 nanogram per millilitre are being tested much 
too frequently’. 

There are benefits to using the PSA test, 
including a reduction in incidence of metastatic 
disease* and in prostate-cancer mortality. But 
too many physicians are applying the test oppor- 
tunistically and inappropriately. Doing so only 
highlights the much-debated drawbacks. But, 
when used judiciously and according to a fixed 
algorithm, these flaws can be avoided. 

The time has come to actually implement the 
evidence-based guidelines into clinical practice. 
Medical associations should better communi- 
cate the best practice around PSA testing and 
strengthen the education of doctors — particu- 
larly general practitioners (GPs) who are usually 
the first point of contact, but are rarely up to date 
with the latest publications. GP requests for test- 
ing should be actively monitored to ensure the message is understood, 
rather than waiting for registry data to see if there has been an effect. 

There is ample knowledge of how to streamline individual testing 
of men who have been appropriately informed. The PSA test is a key 
part of the urologist’s toolkit. By implementing the EAU and AUA 
guidelines on prostate-cancer screening into clinical practice and stop- 
ping its misuse, we can prevent the loss of a screening test that has the 
potential to bring benefit to many men. m 


Monique Roobol is an epidemiologist at the Department of Urology, 
University of Erasmus, Rotterdam, Netherlands. 
e-mail: m.roobol@erasmusmc.nl 
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Work to determine which prostate cancers are truly 
dangerous may finally be coming to fruition. 


BY SARAH DEWEERDT 


little knowledge can be both a blessing 
A a curse. Ever since the prostate- 
specific antigen (PSA) test was intro- 
duced in the United States as a method of 
screening for prostate cancer in the mid-1990s, 
physicians, scientists and public health officials 
have been wrestling with the problem of how 
to use it. 
The blood test looks for high levels of 


PSA — an enzyme that thins the semen to 
allow sperm to swim freely — and enables early 
detection of one of the most common forms of 
cancer. But it is far from infallible. A higher than 
average PSA reading is not necessarily the work 
of a malignant tumour, so the test flags many 
men who do not have cancer. And because 
prostate cancer is often indolent, meaning it is 
slow-growing and unlikely to spread, many of 
the cancers that are detected would never have 
threatened a man’s health if left untreated. “The 


$124 | NATURE | VOL 528 | 17 DECEMBER 2015 


© 2015 Macmillan Publishers Limited. All rights reserved 


issue with prostate cancer is not necessarily 
detecting it early enough, but predicting which 
cancers are aggressive and which are indolent,’ 
says Vadim Backman, a biomedical engineer at 
Northwestern University in Evanston, Illinois. 

This conundrum has led to a difference of 
opinion about how widely to use the PSA test, 
and what action to take if cancer is detected (see 
page S120). PSA screening had quickly become 
widespread in the United States, but in 2012 
the US Preventive Services Task Force recom- 
mended against routine screening. 

The controversy has, however, also stimu- 
lated scientific creativity. Researchers are 
improving the way in which men who are likely 
to have aggressive forms of prostate cancer are 
identified to reduce unnecessary biopsies. And 
to cut down on needless treatment, they are 
developing better ways to evaluate biopsy tis- 
sue and determine which tumours truly pose a 
threat. Because prostate cancer has a long natu- 
ral history, definitive studies to address these 
issues take a long time to complete. Buta flurry 
of publications over the past few years, as well 
as the commercial introduction of several tests, 
suggest that scientific patience is paying off. 


BEYOND PSA 

To do a better job of deciding which men 
should have prostate biopsies, physicians need 
non-invasive tests, either to supplement PSA 
screening or to replace it entirely. “The most 
pressing need is to identify biomarkers that are 
specific for high-grade cancer,’ says urologist 
Scott Tomlins at the University of Michigan 
in Ann Arbor. An ideal biomarker would only 
be expressed in prostate tissue, not elsewhere 
in the body, and only be found in aggressive 
cancer, not low-grade disease. To be useful as a 
screening test, the biomarker would also need 
to show these patterns in blood or urine, not 
just intact tissues’. 

One approach is to improve on the concept of 
the PSA test with tests that can be used to spot 
patterns in particular forms of PSA or suites of 
other, related molecules in the blood that are 
more specifically linked to aggressive prostate 
cancer. One version of this approach, the pros- 
tate health index, integrates three forms of PSA 
into a single score, which is then used to deter- 
mine the risk of an aggressive tumour. Whereas 
another, the 4Kscore test, measures a panel of 
four molecules, including two forms of PSA 
that, like PSA measured in the established test, 
belong to a group of enzymes called kallikreins. 

A study of biopsy tissue from more than 
6,000 men found that screening using the 
four-kallikrein panel could reduce the num- 
ber of unnecessary biopsies — 43% fewer 
biopsies compared with the standard PSA test 
and a delay in the diagnosis of only a handful 
of aggressive cancers’. Another study bolsters 
these results. Researchers followed a cohort of 
men for more than 15 years, and found that the 
blood test predicts which men are more likely to 
develop metastatic prostate cancer in the long 
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term’. “I think that sends a very strong message 


that the way we measure this actually predicts 
something biological and disease-relevant,” 
says Hans Lilja, a clinical chemist at Memorial 
Sloan Kettering Cancer Center in New York and 
a leader of both studies. 

Scientists are also developing urine-based 
screening tests. One of these tests measures 
levels of the biomarkers TMPRSS2-ERG and 
PCA3. About 80% of men with prostate can- 
cer have at least one tumour that produces 
TMPRSS2-ERG — the result of a genetic 
scrambling that occurs very early in the devel- 
opment of many prostate cancers, leading to the 
fusion of two genes. PCA3, a molecule normally 
produced by prostate tissue, occurs at abnor- 
mally high levels in at least 90% of prostate 
cancers — but, unlike PSA, it is almost never 
elevated in benign conditions. “The unique 
thing about those is they're very prostate-cancer 
specific,’ says Tomlins. 

Individuals with high levels of these two 
biomarkers in their urine tend to have a large 
amount of tumour in their prostate, Tomlins 
says. This itself is a good indicator that an 
aggressive cancer is at work. In May, his team 
reported that the two markers do a better job 
of zeroing in on aggressive cancers than the 
standard PSA blood test’. Next, they plan to test 
whether adding an assay for another molecule 
associated with aggressive cancer, SchLAP1, will 
further improve the test. 


MAKING SENSE OF SCREENING 
More informative screening methods are good 
as far as they go, but researchers are also search- 
ing for another piece of the puzzle: how to 
improve the analysis of tissue taken in biopsies 
after a positive screening test. These advances 
would allow physicians to better distinguish 
which cancers require immediate treatment, 
and which can be monitored — an approach 
known as active surveillance (see page $126). 
Oncologists currently evaluate prostate 
biopsies by Gleason grading, a method of scor- 
ing prostate tissue on a scale of 1 to 10 by how 
abnormal its cells appear (see ‘Scoring cancer’). 
Prostate tissue with a Gleason score of 5 or 
below is generally quiescent; tissue with a score 
of 8 or above requires immediate treatment. 
But most common prostate tumours score a 
middle-of-the-road 6 or 7, and these are more 
vexing to deal with. Usually, grade 6 tumours 
can be safely managed with active surveillance. 
But a few will prove to be aggressive. Grade 7 
tumours are more evenly split between those 
that are aggressive and those that are not. 
“There’s a lot in the kind of grey zone that we 
don’t know,” says Jack Cuzick, an epidemiolo- 
gist at Queen Mary University of London. 
Cuzick and his team have evaluated a bio- 
marker known as the cell-cycle progression 
score, which measures the activity of genes 
related to cell division in biopsy tissue. The 
greater the rate of cell division, the more aggres- 
sive the tumour — a pattern that applies to 
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Prostate-cancer severity can be gauged by assessing 
how well differentiated the tissue appears under the 
microscope and grading it 1-5. This is done twice and 
the grade combined, giving a score of between 2 to 10. 
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many forms of cancer. In a study of 585 men 
with prostate cancer, the researchers showed 
that this approach provides additional informa- 
tion about which men with those intermediate 
Gleason-score biopsies are at risk of dying from 
prostate cancer over the course of ten years’. 
“Our feeling is that the cell-cycle progression 
score is a huge step forward to resolve many of 
the controversial cases,” says Cuzick, who also 
consults for Myriad Genetics, the maker of one 
cell-cycle progression test. 

The cell-cycle progression score is one of an 
increasing number of genomic tests for pros- 
tate cancer — others evaluate between one- and 
two-dozen genes associated with prostate can- 
cer. A weakness of these tests, however, is that 
they work best if they are applied to biopsies 
taken from the most aggressive part of the can- 
cer — and that is not always obvious. 

That is because many men with prostate 
cancer have multiple tumours of independent 
origin. These various tumours can differ in their 
aggressiveness. Studies of men who had their 
prostate removed have found that 15-40% of 
those diagnosed with low-grade cancer at 
biopsy actually have a more aggressive tumour 
elsewhere in the prostate. 

Backman and his team have developed a form 
of microscopy that they say could overcome 
this difficulty by allowing pathologists to see 
changes inside cells that are too small to resolve 
with standard microscopy. The researchers, 
who formed NanoCytomics in Evanston, IIli- 
nois, to commercialize the technology, found 
that non-cancerous tissue taken from prostates 
that contain Gleason grade 6 tumours that 
turned out to be aggressive show characteristic 
nanoscale changes, especially in the packag- 
ing of DNA in the cell nucleus’. Prostates that 
contain non-aggressive grade 6 tumours do not 
show these alterations. 

The advantage of this type of test is that 
doctors could potentially determine the 
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aggressiveness of a tumour without having to 
biopsy the tumour itself. “We don’t need to find 
the needle,” Backman says. “All we have to do is 
sample the haystack.” 


SMARTER BIOPSIES 

Finely locating the tumours within the prostate 
is still an option, though. This is the focus of 
a third category of efforts aimed at improving 
the prostate-biopsy procedure, which involves 
taking samples of tissue — usually 10 to 12, but 
sometimes as many as 50 — with a fine needle. 

Prostate biopsies have generally been per- 
formed with little information about exactly 
where in the prostate a sample comes from. This 
is because it is difficult to get a clear picture of 
the organ using standard imaging methods. But 
now, an approach known as multiparametric 
magnetic resonance imaging (MRI) is begin- 
ning to change that’. The procedure combines 
three techniques to generate a fuller picture of 
prostate anatomy and function. “We can go 
after the area that we think is most likely to have 
high-grade cancer,’ Tomlins says. 

Earlier this year, researchers found that 
oncologists locate more high-grade tumours 
when aided by multiparametric MRI than 
with standard biopsy procedures’. “It decreases 
the risks associated with active surveillance,” 
says the study leader Peter Black, a urological 
oncologist at Vancouver General Hospital in 
Canada. “You're able to take these patients out 
of the active surveillance pool and treat them? 

The technique also makes it possible to fol- 
low the development of a specific tumour and 
repeatedly biopsy it over time. This capability 
could help to address some basic questions 
about prostate cancer — with implications 
for treatment. “For example, do low-grade 
tumours routinely turn into higher-grade or 
more aggressive tumours?” Tomlins says. “It’s a 
crucial question because it totally changes how 
we predict whether cancers are going to be indo- 
lent or aggressive.” 

The challenge now is to bring together these 
varied strands of research, because the new 
biomarkers and testing strategies have largely 
been developed in isolation from each other. 
“Very little has been done to see if these can add 
to each other and how much we would gain by 
doing that,’ Lilja says. So even as techniques that 
may yield a better understanding of a patient's 
prognosis begin to roll out, scientists are aiming 
at the next round of improvements. m 


Sarah DeWeerdt is a freelance science writer 
in Seattle, Washington. 
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Having opted for active surveillance, Bill Wilson has avoided surgery and continues with his busy lifestyle. 


TREATMENT 


When less is more 


Surveillance is becoming a watchword for men with 
less-aggressive prostate cancer. If and when the disease 
progresses, new and newly- timed therapies are at hand. 


BY MEREDITH WADMAN 


hen Bill Wilson learned that he had 
prostate cancer in 2011, he wanted 
to race to the nearest operating the- 


atre. Wilson, a 71-year-old former IBM execu- 
tive from St. Michael’s, Maryland, says his first 
thought was: “I'm going to get it out of there” 
But his urologist encouraged him to talk to 
other specialists, and so Wilson visited Bal- 
lentine Carter, a prostate-cancer researcher 


at Johns Hopkins University in Baltimore, 
Maryland. Carter urged him to consider a 
more conservative approach called active sur- 
veillance. This involved Carter simply moni- 
toring the tumour over time; treatment would 
be launched only if the disease progressed. 

“A huge weight was lifted off my shoulders 
immediately,” says Wilson, who had feared 
both the ordeal of surgery and its common side 
effects, including incontinence and impotence. 
“My cancer was not life-threatening and might 
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never be; therefore, I did not have to immedi- 
ately have aggressive treatment.” 

Aggressive treatment is common: according 
to a recent report’, 50% of US men diagnosed 
with low-risk prostate cancer between 2010 
and 2013 underwent radical prostatectomy 
(surgical removal of the prostate and sur- 
rounding lymph nodes). However, active sur- 
veillance is being offered by a growing number 
of physicians, and taken up by a rising number 
of men (see ‘Active prime time). A large study 
in Sweden, led by Stacy Loeb, a New York Uni- 
versity urologist, found that nearly half of men 
diagnosed with low-risk disease between 1998 
and 2011 opted for active surveillance, with the 
proportion increasing over this time period’. 
In the United States, the turn toward active 
surveillance for low-risk disease has been dra- 
matic, growing from single-digit percentages 
in the late 1990s to 40% between 2010 and 
2013 (ref. 1). 

Disagreements remain on which cancers are 
best suited to active surveillance, and on how 
to monitor those men selected. Nonetheless, 
active surveillance “has spiked and become 
prime time’, says Loeb. And for men whose 
disease does progress, new treatments and 
protocols are lengthening and improving the 
quality of their remaining life. 


A BALANCING ACT 

According to criteria set by Carter’s Johns 
Hopkins group, Wilson was a good candidate 
for active surveillance: his tumour cells did 
not look aggressive under the microscope; the 
tumour could not be felt by digital rectal exam 
and was only picked up by needle biopsy; and 
he had relatively low levels of prostate-specific 
antigen (PSA), a blood marker that is a proxy 
for the presence of prostate cancer (Wilson 
underwent the biopsy only because his PSA 
was found to be slightly elevated: 0.57 nano- 
grams per millilitre above the threshold of 
4ng ml.) 

The active-surveillance regimen requires 
Wilson to visit Johns Hopkins every 
six months for a digital rectal exam and a blood 
test. Annually, he undergoes either a biopsy or 
a magnetic resonance imaging scan, for a more 
detailed inspection. His most recent biopsy 
worried Wilson slightly because small areas of 
cancer were found in 3 of the 12 tissue samples 
taken, up from 2 of the 12 at diagnosis. Still, the 
cancer cells did not appear more aggressive, 
and his PSA levels remained reassuringly low. 

The surge in uptake of active surveillance 
is due, in part, to the response of a medical 
community that was roundly criticized for 
overtreatment of a disease that is too readily 
identified by PSA screening (see page $120). 
By offering active surveillance as a conserva- 
tive way to manage men at lower risk, clini- 
cians hope that the balance of harms and 
benefits from PSA screening will shift. “The 
acceptance of surveillance is going to be a 
crucial piece of rehabilitating PSA screening,” 
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says Laurence Klotz, a urologist in Sunnybrook 
Health Sciences Centre at the University of 
Toronto, Canada. 

Klotz’s group has published some of the 
longest-term data on the results of active 
surveillance, which it began offering in 1995 
(ref. 3). The Toronto group followed 993 men, 
most of whom had low-risk tumours. It also 
included some men with slightly higher-risk 
tumours who had other significant illnesses 
and less than ten years of life expectancy. The 
group was followed for a median of 6.4 years; 
1.5% died of prostate cancer. 

For men with low-risk tumours, the com- 
parable outcome after surgical removal of the 
prostate is slightly better. A 2011 study involv- 
ing 24,000 men showed that, after 15 years of 
follow-up, between 0.2% and 1.2% on average 
had died of prostate cancer*. But in the same 
study, men with slightly higher-risk tumours 
— equivalent to the riskier subgroup in the 
Toronto cohort — did not fare as well: their 
average 15-year prostate-cancer-specific mor- 
tality ranged from 4.2% to 6.5%. 

Carter’s team published its results in 
August®. Of the 1,298 men with low- or very- 
low-risk disease placed on active surveillance 
since 1995, only 0.1% had died of prostate 
cancer at 10 years of follow-up; at 15 years, the 
figure was unchanged. 

“The take-home message is, in well-chosen 
patients, active surveillance is safe,” says Fred 
Saad, a urologist at the University of Mon- 
treal Hospital Centre in Canada. However, 
the Johns Hopkins and Toronto studies also 
highlight one of the sticking points that active- 
surveillance advocates are still wrestling with: 
inclusion criteria. The Johns Hopkins group 
was more conservative than the Toronto 
team: in Baltimore, only low-risk men were 
enrolled, which could help to explain the bet- 
ter results. In a 2014 
review of ten active- 
surveillance studies, 
the approach seemed 
to reduce over- 
treatment without 
compromising men’s 
ten-year cancer sur- 
vival, but the authors 
stated that the data are not yet mature enough 
for definitive conclusions to be drawn. Fur- 
thermore, they wrote bluntly, current tools for 
selecting patients and monitoring the disease 
are “inadequate and imprecise”. 

Freddie Hamdy, who specializes in prostate 
and bladder cancer at the University of Oxford, 
UK, notes that there are multiple active sur- 
veillance protocols in use at different centres, 
and many ways of interpreting them. “How are 
you going to detect or decide that the patient 
should no longer be on active surveillance 
because it’s not safe?” he says. “That’s the real 
challenge.” 

Hamdy hopes to shed new light on this 
question with a first-of-its-kind, randomized 
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In US men diagnosed with low-risk prostate cancer, 
the number choosing active surveillance is rising. 
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controlled trial in which 1,643 men newly 
diagnosed with localized prostate cancer are 
randomly assigned to receive active monitor- 
ing, surgery or radiation therapy. Hamdy’s 
group plans to report results of the ProtecT 
trial, showing disease-specific survival rates 
at a median ten years’ follow-up, as early as 
spring 2016. 


SECOND ACTS 

For men whose disease at diagnosis is too 
aggressive for active surveillance, treatment 
begins with some combination of surgery, 
radiation and medication — the first-line 
treatments for decades. Because prostate- 
cancer cells are stimulated to grow by testos- 
terone, more than 90% of which is made in 
the testes, one approach is surgical castration: 
the operative removal of the testes. (The rest 
of the body’s testosterone is made in the adre- 
nal glands.) The same end can be achieved 
with medical castration — the use of drugs 
that suppress the release of hormones that 
stimulate testosterone production. 

Nearly all tumours eventually become resist- 
ant to these testosterone-lowering approaches, 
and when they spread, they are described as 
metastatic castration-resistant prostate can- 
cer. For men in this stage, the past five years 
have seen remarkable advances in both new, 
life-prolonging drugs and in better ways to use 
older agents. 

The drugs abiraterone and enzalutamide 
were first approved in 2011 and 2012, respec- 
tively. Abiraterone works primarily by inter- 
fering with testosterone synthesis, whereas 
enzalutamide prevents the hormone from 
binding to the androgen receptor. Although 
neither is curative, the advent of these oral 
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agents has improved both survival and qual- 
ity of life for men with advanced disease. “It is 
fantastic to give drugs that have almost no side 
effects and that make men feel better, have less 
pain and live longer,’ says Saad. Nonetheless, 
over time, most patients develop resistance to 
these drugs — a reality that has galvanized the 
hunt for new therapeutics (see page S128). 

In another positive development, strik- 
ing new evidence shows that a change in the 
timing of chemotherapy can buy men with 
metastatic disease many more months of life. 
Docetaxel — a chemotherapy agent that sup- 
presses cancer-cell division — was approved 
in 2004 for use after medical or surgical castra- 
tion has failed. But results of the CHAARTED 
trial published this year showed that the lives of 
men with metastatic disease were significantly 
extended when they were given docetaxel ear- 
lier, simultaneously with castration’. They 
lived a median of 14 months longer than men 
who underwent only medical or surgical cas- 
tration. For men with the most extensive dis- 
ease, the difference was still more pronounced: 
17 months. The results “were really dramatic’, 
says Matthew Cooperberg, a prostate-cancer 
specialist at the University of California, San 
Francisco. “You don’t see a 17-month survival 
advantage very often in these types of trials.” 

The CHAARTED trial was led by 
Christopher Sweeney, a physician at the 
Dana-Farber Cancer Institute in Boston, 
Massachusetts. He suggests that the success 
came from deploying drugs with different 
mechanisms of action, thus targeting both 
the testosterone-sensitive and testosterone- 
insensitive cancer cells simultaneously. 
“That’s speculation, but something like that 
might be happening,” says Sweeney. He and 
others are now running trials to see whether 
patients will benefit when the newer drugs, 
enzalutamide and abiraterone, are likewise 
deployed alongside castration earlier in the 
course of metastatic disease. 

Meanwhile, having been spared aggressive 
treatment, Wilson is taking the ‘active’ part of 
his active surveillance seriously. He spends his 
time sailing his 35-foot sloop, Adagio, training 
his dog Jeter for agility contests, and getting 
out into the countryside — his disease far from 
his mind. “I just came back from hiking in the 
mountains in Arizona and put the 40-year-olds 
to shame.” 


Meredith Wadman is a freelance writer based 
in Washington, DC, and an editorial fellow at 
New America. 
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Resistance 
fighters 


Strategies to destroy treatment-defying tumours in men 
with prostate cancer are beginning to make a difference. 


BY NEIL SAVAGE 


hen the patient entered a trial of 
an experimental prostate-cancer 
treatment, he was in bad shape. 


The disease had spread to at least ten dif- 
ferent parts of his body, including his arm 
and leg bones, and his hip, spine and ribs. 
The tumours caused him so much discom- 
fort that, despite heavy use of pain-relieving 
medication, he was unable to sit up. Chemo- 
therapy had failed to halt the spread of the 
cancer. But now, nearly seven years after fin- 
ishing the trial, the patient’s tumours have 
disappeared, his pain has vanished and his 
blood levels of prostate-specific antigen 
(PSA; a protein biomarker used to moni- 
tor malignancy) give no indication of the 
disease. 

“We always are cautious using the word 
‘cure,’ says Fred Saad, a prostate-cancer 
researcher at the University of Montreal 
in Canada, who ran the study. “There are 


diseases we have cured in a very advanced 
stage, like lymphoma, like testicular cancer,’ 
he says. But despite individual successes, 
advanced prostate cancer is still considered 
to be incurable. 

Many men with the disease have tumours 
that grow so slowly that they never cause a 
problem. Others can be cured by treating 
the tumour within the prostate gland. But 
in some, the cancer spreads to elsewhere in 
the body, usually to the bones. The first line 
of treatment for these men is to suppress the 
male sex hormones (androgens), such as tes- 
tosterone, that stimulate prostate tumours 
to grow — a form of chemical castration. 
Within a year or two, however, tumours 
become resistant to this treatment. 

Until the early 2000s, there were no availa- 
ble treatment options for castration-resistant 
prostate cancer (CRPC). Since 2010, a hand- 
ful of therapeutic strategies for treating 
CRPC have emerged. But at best, they add 
a few months to patients’ median survival 
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time. So researchers are working to under- 
stand the mechanisms by which prostate 
cancer is able to resist efforts to overcome it, 
and to develop approaches that can perma- 
nently defeat the disease. 

Saad’s study is one such attempt’. The 
phase II trial focused on men with meta- 
static CRPC whose condition had wors- 
ened despite undergoing chemotherapy 
with docetaxel, a drug from the taxane fam- 
ily. The researchers focused on clusterin, a 
protein that increases in concentration when 
cells are stressed and seems to protect the 
cells from damaging agents. Researchers 
suspect that clusterin helps various types of 
tumour to become resistant to drugs used in 
chemotherapy. By inhibiting clusterin with a 
drug known as custirsen, the team hoped to 
once again make CRPC tumours vulnerable 
to the effects of chemotherapy. 


REMARKABLE RESPONSE 

The results of the trial were encouraging. 
Men who received custirsen together with 
docetaxel and the immunosuppressant drug 
prednisone showed a reduction in both 
pain and PSA levels. Saad’s patient with 
the impressive results, who was 62 when 
he started the trial, had seen his PSA level 
shoot up from 74 to 115 nanograms per mil- 
lilitre in the 3 weeks before treatment (a PSA 
level below 4 ng ml” is generally considered 
normal; a man who has had his prostate 
removed and is now cancer free should have 
a level of 0). Within 2 weeks of starting the 
trial, his PSA levels had dropped to around 
70 ng ml"', and after 24 weeks, they had 
plummeted to less than 0.03 ng ml’. Seven 
years on, the patient’s PSA level is undetect- 
able. Although this particular case does not 
prove that custirsen can cure prostate cancer, 
Saad thinks that it is remarkable. 

The larger story of custirsen — an example 
of a DNA-based ‘antisense’ drug that binds 
to RNA and switches a gene off — is less 
clear. A phase III trial that used custirsen 
alongside docetaxel and prednisone showed 
no statistically significant improvement in 
the survival of participants with advanced 
prostate cancer compared with those who 
received the same treatment, but without 
custirsen. The results of another phase III 
trial, which combines custirsen and pred- 
nisone with a different anticancer drug, 
cabazitaxel, are expected by early 2016. 

Saad says that the key to finding effective 
treatments for advanced prostate cancer 
lies in identifying those men — like his 
star patient — who will respond to a given 
therapy, perhaps because of a particu- 
lar mutation or variation in their tumour. 
That requires determining which molecu- 
lar mechanisms help to confer resistance to 
drugs in certain people, and finding ways to 
test for them. Large studies that are unable to 
identify subgroups of patients who respond 
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to a therapy can lead researchers to dismiss 
drugs that would work well in the right indi- 
viduals. “The ones that are actually respond- 
ing are drowned in a sea of non-responders,” 
says Saad. 


SPLICE VARIANTS 

The resistance of prostate cancer to chemical 
castration develops by several routes. One 
biomarker of a particular mechanism of 
resistance has already been found — a recep- 
tor protein that binds androgens within the 
cell. Two new anti-androgen drugs, enzalu- 
tamide and abiraterone, can extend the lives 
of men with metastatic prostate cancer by up 
to three years. Eventually, those drugs stop 
working in almost all men — but 20-40% 
of patients never respond at all’. The reason 
for this initial resistance is a variation in the 
messenger RNA sequence that is used as a 
template for building the androgen-receptor 
protein itself. 

To make the receptor, the DNA of the 
androgen-receptor gene is first converted 
into a sequence of RNA that encodes all 
parts of the receptor protein. Any RNA 
that does not code for protein is cut out and 
the remaining pieces of RNA are joined or 
‘spliced’ together to produce the receptor 
template. Occasionally, pieces of protein- 
coding RNA are also removed during splic- 
ing, which creates different versions — splice 
variants — of the receptor template. In one, 
androgen-receptor variant 7 (AR-V7), the 
receptor is missing its ligand-binding area, 
called the carboxyl terminal. This is what the 
androgen normally attaches to, but with no 
receptor mechanism to interfere with, the 
drugs are powerless. However, the area of 
the androgen receptor that triggers the cell 
to divide, found at the protein’s opposite end, 
still works. “It can cause the cancer cell to 
grow and divide even without testosterone 
being present,” says Emmanuel Antonarakis, 
an oncologist at Johns Hopkins Sidney Kim- 
mel Comprehensive Cancer Center in Balti- 
more, Maryland, who helped to identify the 
variant. 

Using a blood test, Antonarakis has 
compared men whose tumours contain 
AR-V7 with those whose tumours do not. 
Whereas men who tested negative for AR- V7 
responded equally well to both anti-andro- 
gen drugs and chemotherapy with taxanes, 
those with AR-V7 did not respond to the 
anti-androgen drugs. But they did respond 
to chemotherapy with taxanes, which dis- 
rupt the microtubules that help cells to 
divide. Antonarakis’s finding is supported by 
a study from the Erasmus University Medi- 
cal Center Rotterdam in the Netherlands, in 
which investigators showed that AR-V7 does 
not diminish the effect of the taxane cabazi- 
taxel’. A study from University Hospital Ulm 
in Germany confirmed the link between the 
variant and androgen resistance’. If these 


findings hold up, Antonarakis says that men 
with AR-V7 could skip the anti-androgen 
treatment and go straight for chemotherapy. 
Men who test negative can choose between 
the two. 

Soon, there might also be more treatment 
options for men with AR-V7. The drug gale- 
terone, for example, the subject of a phase II 
trial, works in three different ways. Like 
enzalutamide, it prevents androgens from 
binding to their receptors. And like abira- 
terone, it interferes with the production of 
testosterone. But galeterone also degrades 
the androgen receptor itself — an action 

that could prevent the 
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testing has shown that 
PSA levels dropped in men with CRPC who 
took galeterone during phase IJ trials. Initial 
results of a phase III trial, which focuses spe- 
cifically on men with AR-V7, are expected 
by the end of 2016. 

Essa Pharma of Vancouver, Canada, is 
taking a different approach to the problem 
of resistance with its drug EPI-506, cur- 
rently being prepared for phase I/II testing. 
Although most anti-androgen drugs target 
the end of the androgen receptor to which 
androgens bind, Essa’s drug is the first to tar- 
get the receptor’s opposite end, which can 
interact with the DNA of the cell. By block- 
ing this part of the receptor, the drug could 
prevent it from doing its job — stopping the 
cancer in its tracks. “Ifit can’t bind to DNA, 
it can’t switch on these genes to divide, mul- 
tiply and spread,” Antonarakis says. 


DNAREPAIR 

Splice variants are not the only way that 
prostate cancer can become resistant to 
anti-androgen drugs. When hit with a 
therapy, the disease — like any other can- 
cer — mutates and develops mechanisms 
to help it to survive and grow. And anti- 
androgen drugs such as enzalutamide and 
abiraterone can inadvertently switch on the 
cancer-promoting mechanisms that andro- 
gens normally suppress. “You activate a 
sort of replacement pathway,” says Timothy 
Thompson, an oncologist who is director of 
prostate-cancer research at the University 
of Texas MD Anderson Cancer Center in 
Houston. 

Anti-androgen drugs actually “unrepress” 
oncogenes such as c-MYB, switching on 
pathways that help to promote the growth of 
cancer. In fact, drugs such as enzalutamide 
seem to stimulate mechanisms that repair 
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DNA damage’ — not enough to create nor- 
mal cells, but sufficient to allow cancer cells 
to multiply and spread. 

Researchers are searching for specific 
steps in the c-MYB pathway that they could 
target with new or existing drugs. Of par- 
ticular interest is a class of enzymes called 
poly(ADP-ribose) polymerases, known as 
PARPs, which play a part in repairing dam- 
aged DNA®. Drugs that inhibit PARPs might 
disrupt the repair process and make cells 
more vulnerable to other forms of chemo- 
therapy. PARP inhibitors are already being 
tested for the treatment of patients with 
breast cancer who have mutations in the 
genes BRCA1 and BRCA2, and in Decem- 
ber 2014, olaparib became the first such drug 
to be approved by the US Food and Drug 
Administration for treating ovarian cancers 
with the same BRCA mutations. 

In April 2015, researchers from the Insti- 
tute of Cancer Research and the Royal 
Marsden NHS Foundation Trust in London 
presented the results of a phase II trial of 
olaparib for men with metastatic prostate 
cancer. Lead researcher Johann De Bono 
says that a handful of patients showed 
“spectacular responses” to treatment with 
olaparib — their tumours disappeared from 
imaging scans. Others saw their PSA level 
cut in half. And all of the seven trial partici- 
pants who had mutations in the gene BRCA2 
responded to the drug in some way. 

Such discoveries could open the door to 
multipronged approaches in the fight against 
a disease for which there was no effective 
therapy just over a decade ago. That could 
revolutionize the treatment of advanced 
prostate cancer, says Saad, by bringing 
approaches in line with those for other 
cancers. “Prostate cancer is still one of the 
few, or only, solid tumours treated with a 
mono-treatment approach,” he says. “Where 
we need to go in the future is combining 
therapies.” 

Although it might be a long time before 
the lives of most men with advanced pros- 
tate cancer can be significantly prolonged, 
Antonarakis agrees that combining therapies 
that block androgen receptors and destroy 
resistance mechanisms will soon stop the 
disease from being 100% fatal. “In the next 
five to ten years,” he predicts, “we will be 
able to cure a small percentage of metastatic 
castration-resistant prostate cancer.” = 


Neil Savage is a freelance science and 
technology writer in Lowell, Massachusetts. 
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Inflammation is an underlying cause of many cancers — and prostate cancer might turn WS 


out to be one of their number. 


BY KIRSTEN WEIR 


hen Angelo De Marzo peers at 
cancerous prostate tissue through 
the lens of his microscope, he often 


sees a total mess. 

There are the cancer cells, of course, as well 
as abnormal cells thought to be precursors to 
cancer. There are also pockets of a third cell 
type: shrunken, withered cells that — despite 
their ailing appearance — are dividing rapidly. 
And surrounding that sickly stew are areas in 
which inflammation has set in for the long run. 

But that might not be by accident. Inflam- 
mation in the prostate gland is common, and 
it is even more common in men with prostate 
cancer. De Marzo — a pathologist and oncolo- 
gist at the Johns Hopkins University School of 
Medicine in Baltimore, Maryland — is part ofa 
growing group of researchers who suspect that 
inflammation could be both a symptom and a 
cause of the disease. If so, physicians might one 
day be able to treat or even prevent prostate can- 
cer by turning down the volume of the body’s 
immune response. 


DOUBLE-EDGED SWORD 

The immune system isa fickle friend. It protects 
us against invading pathogens and attempts to 
snuff out precancerous cells before they run 
wild. Inflammation lies at the heart of the 
immune response. But in the rush to attack 
potential pathogens, inflammation can cause 
collateral damage. “It's a two-edged sword,’ De 
Marzo says. 

In the past two decades, scientists have begun 
to determine precisely how inflammation over 
an extended period of time could lead to the 
development of tumours. The classic example is 
gastric cancer, which can be caused by persistent 


inflammation that is triggered by the bacterium 
Helicobacter pylori. Inflammation is also impli- 
cated in cancers of the liver, bladder and colon. 
As many as one-fifth of all cancers might be 
attributable to inflammation, according to Scott 
Lucia, a pathologist at the University of Colo- 
rado’s Anschutz Medical Campus in Aurora. 

Results from animals and humans suggest 
that prostate cancer also belongs on that list. 
“There's no definitive smoking gun that inflam- 
mation causes prostate cancer,’ says De Marzo, 
but “there’s a lot of evidence building” 

One reason for the uncertainty is that most 
samples of prostate tissue that are available for 
researchers to study have been removed from 
patients because of a medical problem — usu- 
ally, in a biopsy performed after the discovery 
of an elevated level of prostate-specific antigen 
(PSA) in the blood. PSA is produced by the 
prostate gland, and a high concentration of the 
protein indicates that a person could have pros- 
tate cancer. But chronic inflammation alone can 
also raise PSA levels. As a consequence, men 
with inflamed prostates might be more likely to 
undergo biopsies that detect small tumours that 
would otherwise have gone unnoticed. If so, the 
association between inflammation and prostate 
cancer could be just an illusion. 

De Marzo, Lucia and colleagues found a 
way to avoid this ‘ascertainment bias’ by using 
data from a fortuitously designed clinical trial. 
Between 1993 and 2004, the Prostate Cancer 
Prevention Trial set out to determine whether 
the drug finasteride could prevent prostate 
cancer. All participants who did not have cause 
for biopsy during the course of the trial were 
required to undergo an end-of-study biopsy, 
even if their PSA levels were low. By examining 
samples of benign tissue taken from the pros- 
tates of 400 men who were given a placebo in 
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the trial, around half of 
whom had been diag- 
nosed with prostate 
cancer, De Marzo and 
Lucia’s team discovered that inflammation was 
very common’. Indeed, 78% of the men who 
were free from cancer showed signs of inflam- 
mation. However, inflammation was still much 
more likely to be found in men with cancer, 
appearing in 86% of samples from men with the 
disease, and 88% of samples from men with the 
most aggressive, high-grade cancer. “There is a 
relationship between cancer and inflammation, 
says Lucia. “As the amount of inflammation goes 
up, the odds ratio of having cancer — and in 
particular, high-grade cancer — went up-” 

Although De Marzo and Lucia’s study con- 
firmed an association between inflammation 
and prostate cancer, it was unable to answer the 
question of which comes first. “With something 
as common as inflammation, you see these 
relationships, but you don’t know if they're 
causative,’ says Lucia. “If we could remove 
inflammation, would we lower the risk of pros- 
tate cancer? We don't have a means of doing that 
right now’ 


Acne-causing bacteria 
are linked to prostate- 
cancer mortality. 


INFECTIONS AND DIET 

Ifinflammation does contribute to the develop- 
ment of prostate cancer, it is logical to ask what 
might be the cause. Infection is the leading sus- 
pect, and has been for some time. 

In the 1950s, researchers observed that 
prostate cancer was more common in uncir- 
cumcised men’. This finding led them to 
propose that prostate cancer might be trig- 
gered by sexually transmitted pathogens, 
which they reasoned were more likely to be 
present in uncircumcised men. The hypoth- 
esis has since been supported by a number of 
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population-based studies. In particular, the 
bacterial infections gonorrhoea and chlamydia 
have been linked to an increase in the risk of 
developing prostate cancer, as has infection 
with the protozoan Trichomonas vaginalis. 

Such infections can now be treated quickly 
with antibiotics. But rodent models hint that a 
short-term infection can launch what becomes 
an extended, or chronic, inflammatory 
response. Karen Sfanos, a pathologist at Johns 
Hopkins University School of Medicine, found 
that after a rat or mouse is cleared of a bacterial 
infection of the prostate, inflammation in the 
gland can persist for the rest of the animal’s life. 
“Even a single infection seems to set up some 
kind of chronic inflammatory event,’ she says. 

Sexually transmitted bacteria and protozoa 
are not the only pathogens that make their way 
into the prostate, thanks to the gland’s loca- 
tion in the body. “The urethra actually passes 
through the prostate,’ Sfanos says. “There could 
be a very rich flora that’s poised right there, 
where the prostate sits, that could continually 
bea source of exposure to microorganisms.” 

Sfanos has shown that strains of the bacte- 
rium Escherichia coli that are associated with 
urinary tract infections can cause an inflamma- 
tory response in the prostate of rodents. And 
so can Propionibacterium acnes, the bacterium 
associated with the common skin condition 
acne, according to studies in men. The cultur- 
ing of P acnes from inflamed prostate tissue led 
to the finding that men with a history of severe 
acne had a significantly increased risk of death 
from prostate cancer’, 

Although infection is likely to cause chronic 
inflammation of the prostate, another suspect is 
the food on your plate. Prostate cancer is much 
more common in the United States and West- 
ern Europe than in Asia. “Diet could be one of 
the factors that explain the differences in rates,” 
says Elizabeth Platz, an epidemiologist at Johns 
Hopkins Bloomberg School of Public Health. 

Research has shown that the consumption 
of certain foods can raise or lower the risk of 
developing prostate cancer. For example, a diet 
rich in red meat (and 
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that is abundant in well- 


cooked meat — developed cancer in the ventral 
lobe of the prostate’. Notably, the team also found 
that inflammatory cells were more plentiful in the 
same lobe. Foods with anti-inflammatory prop- 
erties, such as soya beans and green tea, however, 
have been shown to decrease the incidence of 
prostate cancer in animals. Those foods have also 
been linked to a lower risk of developing prostate 
cancer in epidemiological studies in humans. 
De Marzo thinks that a number of factors 
probably come together to create chronic 
inflammation in the prostate. “Something 
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Pockets of shrivelled cells called proliferative inflammatory atrophy may be a precursor to prostatic 
intraepithelial neoplasia and prostate cancer. These lesions are often associated with chronic inflammation. 
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seems to be targeting the prostate,’ he says. “We 
suspect it’s a combination of infectious agents 
and diet.” 


CARCINOGENIC OOZE 

De Marzo began to study inflammation in the 
prostate after noticing the strange pockets of 
shrivelled cells that he dubbed proliferative 
inflammatory atrophy (PIA). Despite their 
appearance, cells in PIA lesions proliferate at 
almost the same rate as cancer cells. Some- 
times, PIA cells seem to merge with abnormal 
cells from regions of prostatic intraepithelial 
neoplasia (PIN), which are also thought to be 
a precursor to prostate cancer (see ‘Cancer cul- 
prit’). And often, signs of chronic inflammation 
lurk nearby. “It looks like the inflammation 
might come first, and these lesions can result,” 
De Marzo says. 

Inflammatory cells can elicit the production 
of DNA-damaging oxidants. They also secrete 
the signalling proteins cytokines, which have an 
important role in regulating surrounding cells 
and can cause them to proliferate, De Marzo 
says. In other words, there are signs of oxidative 
stress, genetic instability and runaway cell divi- 
sion in areas where PIA, PIN and inflammatory 
cells huddle. “You're setting up the primordial 
ooze for carcinogenesis,’ says Lucia. 

But not all inflammatory cells fight for team 
cancer. Some prevent precancerous lesions from 
taking hold. Researchers still have a long way to 
go to understand which cells, or combinations 
of cells, are helpful and which cause harm. 

Lucia is focusing on a cytokine known as 
growth differentiation factor 15 (GDF-15), 
which is involved in regulating inflammatory 
pathways. GDF-15, he says, has been shown to 
slow the growth of tumours in the colon in ani- 
mal studies. With James Lambert, a pathologist 
also at the University of Colorado’s Anschutz 
Medical Campus, Lucia found that whereas 
GDF-15 was common in healthy prostate tissue, 
it was sparse in samples with chronic inflamma- 
tion®. He suspects that the protein acts as a brake 
on inflammation — a useful tool for a gland that 
is situated so close to the urethra and the poten- 
tial pathogens it harbours. “It could be that if 


GDF-15 is inhibited, chronic inflammation 
develops,” he says. Lucia is now exploring how 
GDF-15 might inhibit the tumour-promoting 
factors produced by some inflammatory cells, 
and possibly help to prevent prostate cancer. 

Sfanos, meanwhile, is moving in a different 
direction by homing in on inflammatory cells 
that might increase the risk of developing pros- 
tate cancer — a daunting task. She is attempting 
to count and map the locations of different types 
of inflammatory cell in the prostate, starting 
with those that are known to be associated with 
other cancers. Eventually, Sfanos hopes that 
her work will reveal which combinations of cell 
types are harmful and which might be protec- 
tive. “We hope to understand what is a good mix 
of inflammatory cell types versus what seems 
to be not so good, as far as the development of 
advanced disease,’ Sfanos says. 

Physicians might then be able to run a test 
that determines the mixture of immune cells 
that are present in the cancerous prostate of a 
patient. “Ifthere are more inflammatory cells of 
acertain type, or more immune cells in general, 
does that give us information about prognosis?” 
asks Platz. “Ifit does, perhaps those men need 
more or less surveillance going forward.” 

Such work on inflammation could have 
important implications for the prevention of 
prostate cancer. “We don’t think it’s a normal 
process for the prostate to grow too large or to 
get cancer,’ De Marzo says — a point of view 
that he acknowledges is counterintuitive, given 
the frequency of these conditions. “Ifit turns out 
there is an infectious cause — or two or three 
or four — and we could treat those, that might 
ultimately prevent a lot of disease.” = 


Kirsten Weir is a freelance science writer in 
Minneapolis, Minnesota. 
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Surgeon Declan Murphy positions a robotic device above a patient’s abdomen so that a 3D telescope and surgical instruments can be installed. 


Q&A: Declan Murphy 
A robot convert 


In 2004, surgeon Declan Murphy was not convinced that using a robot to remove a cancer-riddled prostate was a significant improvement on 
keyhole, or laparoscopic, surgery. Eight-hundred robotic procedures later, he has not only changed his mind, but is now director of Robotic Surgery 
at the Peter MacCallum Cancer Centre in Melbourne, Australia. 


How does robotic surgery compare with other 
surgical methods for removing a cancerous 
prostate? 
There are some benefits to robotic radical 
prostatectomy over open surgery that are dif- 
ficult to argue with. First, men can leave hospi- 
tal much quicker: 85% of patients go home the 
next day. Second, the blood transfusion rates 
are significantly lower than for open surgery. 
Third, general surgical complications such 
as clots and infections also seem to be lower 
— and that is because it is minimally invasive 
surgery, like laparoscopic surgery. 
Conventional laparoscopic prostatectomy, 
with a 2D view and straight instruments, is 
technically challenging, and this leads to longer 
operative times. A key paper published a few 
years ago (A. J. Vickers et al. Lancet Oncol. 
10, 475-480; 2009) showed that the learning 
curve for laparoscopic prostatectomy was very 


long, much longer than for open surgery. The 
authors reported that around 750 cases were 
needed — which is a lifetime’s work for many 
surgeons — before you would achieve your 
lowest cancer recurrence rates. 


How difficult is it to train with the robotic 
system? 

The learning curve is much shorter than for 
laparoscopic prostatectomy — my colleagues 
and I estimated it was upwards of 80 cases. 
The device has some fantastic training fea- 
tures, such as a dual console, so it is like learn- 
ing how to drive a car. There is also a touch 
screen that consultants can draw on so that 
our annotations will come up on the robot 
console, showing the trainee surgeon where 
to cut and where not to cut. However, robotic 
radical prostatectomy is a complex procedure 
that requires modular training and should only 
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be done by specialists. My department has an 
extremely strict list of requirements, and we 
frequently deny people access because they 
don’t have the credentials. 


Why did you switch from laparoscopic to 
robotic surgery? 

I was sceptical about the robot when I first 
had experience of it as a urology trainee at 
Guy’s Hospital in London in 2004. I was of the 
opinion that you don't need a robot to do these 
operations, you just work harder and train 
harder with laparoscopic tools. But my view 
changed in 2007 when I undertook fellowship 
training in Melbourne, and I began to see data 
regarding outcomes of robotic surgery emerge. 
I realized that I would be able to achieve much 
better results for my patients by performing 
robotic prostatectomy rather than conven- 
tional laparoscopic or open surgery. You don't 


ALAN MOYLE 


ALAN MOYLE 


ALAN MOYLE 


suddenly become a fantastic surgeon just 
by using this device though. The surgeon’s 
training and experience count for more than 
whether he or she is using the robot. 


What are the advantages of 
robotic surgery from the surgeon’s | 
perspective? 

It is impossible to overstate how good 

the view is looking into this machine. The 
prostate is deep down in the pelvis behind 
the pubic bone, so it is difficult to get good 
views with open surgery, especially as there is 
more blood loss with this type of procedure. 
We have got used to very good views with 
laparoscopic surgery, but these are 2D — it’s 
like having one eye closed when you are try- 
ing to stitch. With the 
robotic device, you are 
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advantage is the range 
of movement of the 
instruments. With laparoscopic surgery, 
we have straight instruments that do not 
have a ‘wrist’ on them. But suturing is a very 
dexterous movement. The robotic system 
has wristed instruments: you can turn your 
hand in the machine and the needle turns — 
a much more intuitive interface. 


Does robotic surgery mean better outcomes 
for patients with cancer? 

The problem with prostate cancer is that the 
outcomes take quite a number of years to 
materialize. The short-term surrogates for 
measuring cancer outcomes are things like 
positive surgical margins — when cancer 
cells are found right to the edge of the surgi- 
cally removed tissue. If you have a positive 
surgical margin, you are five times more 
likely to need additional cancer treatment, 
such as radiotherapy, over the following two 
years. When we looked at data on 2,300 radi- 
cal prostatectomies, we found a statistically 
significant 31% reduction in the number of 
patients with positive surgical margins after 
robotic prostatectomy. 

We have also shown that there is a dramatic 
reduction in hospital stay after robotic surgery: 
from five days with open surgery down to just 
over one day. Furthermore, the blood transfu- 
sion rate for open surgery is 15%, whereas it’s 
practically 0% with robotic surgery. 

There are, however, two other important 
areas for patients undergoing radical prosta- 
tectomy where we cannot claim that robotic 
surgery is clearly better: urinary continence 
recovery and sexual function recovery. These 
are major quality of life outcomes that are very 
important to patients — and it is not possible 
to say with any confidence that robotic surgery 


is any better than good open surgery by an 
experienced surgeon. 

I get many patients who have had a biopsy 
taken or been offered open surgery and who 
are seeking a second opinion. I tell them that 
if they have come froma high-volume surgeon 
then, apart from the short-term outcomes of 
hospital stay length, blood transfusions and 
maybe margin rates, their longer term can- 
cer outcomes are going to be just as good 
with those performing open surgery as they 
would be with us. But the reality is that in 
many regions today, Australia included, most 
fellowship-trained, high-volume surgeons, are 
using the robot and the amount of surgeons 
who have performed a large number of open 
procedures is dwindling. 


Have there been any randomized clinical trials 
comparing the types of surgery? 

This is the major criticism over the years — 
we have failed to do randomized controlled 
trials. One such trial comparing robotic and 
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High-definition 3D open radical prostatec- 
images allow precise tomy that has success- 
dissection. fully recruited all of its 


patients is in Brisbane, 
Australia, but a report is not expected until 
early 2016. The Brisbane trial, however, is 
the exception, and in many respects the boat 
has sailed. Hundreds of thousands of robotic 
procedures have already been reported in the 
literature in observational retrospective series, 
so everyone has already read about them and 
made up their mind. It is now almost impos- 
sible to sell a randomized controlled clinical 
trial to patients, or indeed to surgeons. We all 
know robotic surgery is better from a techni- 
cal point of view and for the other short-term 
outcomes, so nobody wants to have open sur- 
gery any more. 


What are the downsides of robotic surgery? 
The massive issue is the cost of the machine. 
It is made by a monopoly provider that has 
fiercely protected its patents — as it is entitled 
to do. The machines cost AUS$2-3 million 
(US$1.4-2.1 million), and there are also recur- 
ring costs; maintenance is about AUS$250,000 
per year and the surgical instruments we use 
cost AUS$3,500 per operation. They are reus- 
able, but only up to 10 times. 

There is a practical difficulty as well. 
Although you have fantastic vision and 
magnification, there is no tactile feedback 
from the wristed instruments — you cant 
feel anything — and surgery has in the past 
relied heavily on sense of touch. However, 
the greatly superior vision more than makes 
up for this. 


Are the costs of robotic surgery balanced by 
the benefits? 

The costs of the machine can be offset by 
reductions in the length of hospital stay and 
number of blood transfusions, and there is a 
critical number where it becomes cost effec- 
tive. In our model, that number is 140 cases. 
If youre amortizing a AUS$3 million device 
over 7 years, including an annual maintenance 
contract, a really important part of diluting the 
cost is to havea high volume of surgery on the 
machine. 

Radical prostatectomy numbers are 
decreasing; the reason is not to do with the 
robot cost, however, but with changes in 
early prostate-cancer screening, detection 
and patterns of care. The number of men 
being offered or asking for a prostate-specific 
antigen test has dramatically dropped, so the 
first reason why fewer radical prostatecto- 
mies are being done is because fewer men are 
being tested. Another reason for the decline 
is the rise in active surveillance as a man- 
agement option for early prostate cancer (see 
page S126). 


INTERVIEW BY BIANCA NOGRADY 


This interview has been edited for length and clarity. 
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An immune one-two punch 


Combination therapies that activate the immune system in complementary ways could help 
more men with prostate cancer to contain their disease long term. 


BY KATHERINE BOURZAC 


hen cancer immunologist Michael 
Curran was a postdoc, he made a 
discovery of the magnitude that 


scientists only dream about. He showed that 
two antibodies that unleash the immune sys- 
tem had a synergistic effect, bringing about 
the eradication of melanoma tumours in mice. 
What is more, this effect also worked in peo- 
ple. Curran and his colleagues published their 
mouse results in 2010 (ref. 1); subsequent clini- 
cal trials showed that the combination therapy 
is so effective at treating people with mela- 
noma that some patients are “durably cured” 
of their cancer, he says. 

Immunotherapy works well for people 
with melanoma, and researchers such as Cur- 
ran, now an immunologist at the University 
of Texas MD Anderson Cancer Center in 
Houston, are trying to create similarly dra- 
matic effects in other cancers. But Curran’s 
therapy does not work for prostate cancer — 
not even in mice. Immunotherapies are new, 
and researchers are still figuring out how 


they work, says Curran. There is one immu- 
notherapy for prostate cancer approved for 
use, and only in the United States. Sipuleucel- 
T adds, on average, a few months to a man’s 
life. But anecdotally, oncologists report men 
who have undergone the therapy living for 
years without needing further treatment. 

To make prostate-cancer-immunotherapy 
success stories more common, physicians 
and immunologists need to understand why 
some men respond to the treatment, and 
some do not. Such insights will help them 
to predict which patients are most likely to 
benefit from these expensive treatments, and 
could guide the design of new versions that 
work better for more people. Several clinical 
trials are testing cancer vaccines (see ‘Immu- 
notherapy on trial’), as well as therapies 
that combat a tumour’s tendency to muffle 
immune responses. Many of these trials are 
exploring combina- 


tions of therapies DNATURE.COM 
that act on different Read more on cancer 
immune-system or immunotherapy at: 


cancer pathways, to go.nature.com/kvpgz 
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make sure that tumour-killing T cells are fully 
equipped to do their work. 


CHASING THE LONG TAIL 

For a patient whose prostate cancer has spread 
to the lungs, bone or elsewhere, the prognosis 
is bleak. Chemotherapy and radiation shrink 
tumours and extend life by a few months, 
but then they stop working — either because 
the tumour mutates to get around a targeted 
therapy or because patients are taken off the 
treatments because of the side effects. Immu- 
notherapy drugs can have longer term effects 
when they work well, but so far that is rare for 
prostate cancer. 

Sipuleucel-T is controversial. The median 
survival benefit is only four months” — about 
the same as conventional therapies — and 
it costs US$93,000. That kind of limited 
benefit and high cost is not unheard of for 
cancer drugs in the United States, but it is 
unusual. And sipuleucel-T is more compli- 
cated to administer than a conventional drug. 
Unlike most drugs, which come premade 
and can be sold off-the-shelf, sipuleucel-T is 


GARY NEILL 


personalized. The patient’s white blood cells 
are separated from their blood and sent to a 
central processing facility. There, these cells 
are incubated with the enzyme prostatic acid 
phosphatase — to train them to seek out can- 
cer cells that produce this protein. The cells 
are then returned to a local clinic and infused 
back into the patient. This process is done 
three times. And although the cell harvest- 
ing can be performed at any Red Cross blood 
bankin the United States, it is still much more 
complicated than writing a prescription for 
a pill or sending a patient to an infusion 
clinic for conventional chemotherapy, says 
Lawrence Fong, an immunologist who treats 
men with prostate cancer in his clinic at the 
University of California, San Francisco. 

That complexity, and the expense that 
accompanies it, have brought about a back- 
lash against sipuleucel-T. The therapy’s crea- 
tor Dendreon, based in Seattle, Washington, 
received US Food and Drug Administration 
approval to market sipuleucel-T in 2010. 
But when the drug’s poor sales figures were 
revealed just a year later, the company’s stock 
plunged 67% in a single day. In November 
2014, Dendreon filed for bankruptcy; its assets 
were sold off the following February. 

Montreal-based Valeant Pharmaceuticals, 
who picked up the drug, withdrew an appli- 
cation to market it in the European Union in 
May 2015. Questions were raised about the 
expense of the treatment and about the clini- 
cal trials, says Hardev Pandha, an oncologist 
at the University of Surrey, UK. “There were a 
few infusions and then that was it;’ he says. “It 
wasnt seen as a sustained treatment” 

One factor weighing against broader 
acceptance is that the mechanism underpin- 
ning how sipuleucel-T works in patients was 
not established in the initial clinical trials. It 
is thought to work with dendritic cells in the 
blood. These cells have receptors that rec- 
ognize the chemical signatures of microbes, 
cancer cells and other antigens, and when 
they spot one, the dendritic cells attach it to a 
protein on their surface like a red flag. These 
warnings kick-start T cells into action, spur- 
ring them to hunt down and kill foreign cells 
that display the antigen. But the clinical tri- 
als did not look for activated T cells or their 
markers in patient samples, says urologist 
Martin Sanda at Emory University School 
of Medicine in Atlanta, Georgia, and for that 
reason it is difficult to know why it works for 
some men and not for others. Researchers are 
now investigating the activity of specific cells 
in men who have had the treatment. 

That the complex therapy works very well 
in some men is reason enough for many phy- 
sicians to offer it. “I have to advocate for my 
patients,” says Fong. He and other oncolo- 
gists know that some patients respond well 
to immunotherapy — something that is not 
reflected in the average survival numbers. 
Their tumours do not shrink, but they stop 
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Administering sipuleucel-T to patients is much more complicated than providing conventional therapies. 


growing, and some men are stable for years. 

Survival rates for patients with late-stage 
disease who are given conventional treat- 
ment plunge to zero after a year or two. By 
contrast, the graph of survival over time for 
those given immunotherapy hasa ‘long tail’ — 
never reaching zero in clinical trials. For Fong, 
one patient in particular illustrates the hope 
for the therapy. The patient’s recurrent meta- 
static prostate cancer had become resistant to 
hormone therapy. “He got the usual treatment 
and responded as most patients do,” Fong 
says. The treatments work for a while, shrink- 
ing tumours for a few months, after which 
they grew anew. Then Fong treated him with 
a course of sipuleucel-T. Five years later, his 
cancer has not grown, nor has he needed fur- 
ther treatment. 


CANCER VACCINES 

Even those such as Fong who offer the treat- 
ment to their patients agree with Pandha, 
who says that immunotherapy needs to move 
away from “bespoke personalized medicine” 
like sipuleucel-T. To that end, researchers are 
working on off-the-shelf vaccines for prostate 
cancer. Furthest along in clinical trials is a 
vaccine developed at the US National Cancer 
Institute (NCI). 

Called PROSTVAC, this therapy borrows 
from the playbook of infectious diseases, using 
two weakened viruses — vaccinia and fowl- 
pox — engineered to carry prostate-specific 
antigen (PSA). The vaccine has been in the 
works since the late 1990s, starting in the lab of 
NCI immunologist Jeffrey Schlom. It showed 
promising results in phase II clinical trials, 
in which patients remained progression free 
for an average of 12 months’, and it is now in 
phase III clinical trials for treating metastatic 
prostate cancer. 

Vaccines target specific antigens — and in 
the case of PROST VAC, PSA is the molecule 
of choice. PSA is a self-antigen: it is made by 
healthy, as well as cancerous, prostate tissue. 
But PROSTVAC is a therapeutic vaccine, 


which is intended to be given only to men 
who have already had their cancerous pros- 
tate gland removed. In these men, the only cells 
producing PSA — and therefore the only cells 
that the vaccine will target — are cancer cells. 
The vaccines also seem to have an effect called 
antigen spreading, says James Gulley, a tumour 
immunologist at NCI. Once the immune sys- 
tem identifies and attacks the tumour, it recog- 
nizes and goes after other tumour antigens that 
it finds on its own. 

Researchers have discovered additional 
targets for treating prostate cancer, and some 
believe that the immune system may be able to 
mounta better response to vaccines that target 
an antigen that is unique to the tumour, rather 
than a self-antigen such as PSA — or to one 
that targets multiple antigens. 

Many transcription factors — regulatory 
proteins that promote or block gene expres- 
sion — are overexpressed in tumour cells and 
so are a good target for cancer therapy. Sanda 
is testing, in animal models, whether the tran- 
scription factors ERG and SIM2 can be used 
as antigens. 

Charles Drake, an oncologist and immunol- 
ogist at the Johns Hopkins School of Medicine 
in Baltimore, Maryland, is taking a different 
approach: a quadruple-antigen vaccine akin 
to a Swiss Army knife. This experimental vac- 
cine uses prostate acid phosphatase — the 
same antigen used in sipuleucel-T cell therapy 
— along with another protein called prostate- 
specific membrane antigen. Both are found 
in normal prostate tissue, but a third antigen 
is specific to prostate cancer. And a fourth 
is a protein that is overexpressed in cells left 
behind after prostate removal, and is consid- 
ered a prostate-cancer precursor gene product. 

Instead ofa virus, Drake's vaccine uses atten- 
uated Listeria bacteria as the carrier. These 
weakened microbes have been used in other 
vaccines, including one for pancreatic cancer, 
which Drake says elicited a strong immune 
response in a phase I] trial led by another 
group at Johns Hopkins. He hopes to see the 
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THERAPY 


Immunotherapy on trial 


After disappointing results from the first 
approved immunotherapy for prostate 
cancer, researchers are developing a host of 
alternatives that could deliver better results 
for more patients. 

PROSTVAC (phase III). Developed at the 
National Cancer Institute, this multicourse 
viral vaccine activates the immune system 
against prostate-specific antigen. 

Hormone and checkpoint therapy 

(phase Il). The high levels of testosterone 

in prostate tumours inhibit the activity of 
cancer-killing T cells. Combining hormone 
therapy with the checkpoint therapy 
ipilimumab could combat this. 

PROSTVAC and ipilimumab (phase Il). By 
combining a viral vaccine and a checkpoint 
therapy, it is hoped that one will activate 


same in the first clinical trials of the prostate- 
cancer vaccine, set to begin by early 2016. 

Drake says Listeria is easier to grow in 
culture than vaccinia and fowlpox. And the 
Listeria vaccine can be given multiple times 
without the need for different carriers like 
PROSTVAC. Other researchers are experi- 
menting with vaccines that use DNA, with no 
carrier at all. A phase II trial of a DNA vac- 
cine now under way will indicate whether this 
method elicits as strong an immune response 
as the attenuated pathogens — a result that 
will be of keen interest to researchers such as 
Sanda who have not yet chosen a carrier for 
their novel antigen targets. 

Trials of these vaccines depend on better 
monitoring of biomarkers, which are the key 
to finding out why some patients respond 
very well and others not at all. “We're learn- 
ing how to collect patient samples and not 
just look at everything in a mouse,” says MD 
Anderson oncologist Padmanee Sharma. 
Although much can be learned from mouse 
studies, she says, the interconnected co- 
evolution of tumour and immune system 
needs to be studied in people. 


COMBO DEAL 

Tumours take advantage of naturally occur- 
ring checkpoints that prevent healthy immune 
reactions from becoming dangerous. Once a 
T cell is activated by an antigen and expands 
its numbers, it starts expressing a checkpoint 
receptor. “This puts an expiration on T cells 
of a day or three,” says Curran. That is a good 
thing — you do not want billions of killer 
immune cells accumulating in your lungs 
after your cold clears up. But tumours turn 
this safety mechanism to their advantage, 
using that receptor as a target for its own sup- 
pressive signals, so that T cells never get going. 


T cells and the other will keep the tumour 
from suppressing them. 

Sipuleucel-T with checkpoint therapy 

and chemotherapy (phase Il). This trial is 
looking for synergy between the cell therapy 
sipuleucel-T and checkpoint therapy, along 
with the conventional chemotherapy drug 
cyclophosphamide. 

Sipuleucel-T with ipilimumab (phase II). By 
combining sipuleucel-T with checkpoint 
therapy researchers hope to determine 
what order they should be given in — block 
tumour suppression first then provide cell 
therapy, or the other way around? 

DNA vaccine (phase Il). Instead of a 
microbial carrier, this vaccine uses a naked 
DNA plasmid that codes for prostate acid 
phosphatase. K.B. 


Checkpoint therapies target and block these 
suppressive signals. 

T cells have to be attracted to a tumour in 
the first place, however, otherwise checkpoint 
therapy has no effect. Treatments such as vac- 
cines and cell therapies (sipuleucel-T) stimu- 
late this process, and a combination of these 
treatments and checkpoint therapy maybe the 
best way forward. This is now being tested in 
patients. 

The conventional wisdom is that checkpoint 
therapy works well in mutation-rich cancers 

such as melanoma 


“We're learning because these tumours 

generate high levels of 
how to collect ant fat 

tientsamples "OY. See 

a is inal attract T cells to the 
an not just oe tumour. The T cells 
at everything m then just need a little 
amouse. 


boost from check- 
point therapy. Prostate 
tumours are not as rich in T cells, a deficit that 
researchers suspect is because they have too 
few mutations to catch the immune system's 
attention. 

Curran thinks it is more complicated than 
that. After all, he says, prostate cancer is not 
uniquely low in mutations among cancers — it 
ranks somewhere in the middle. On average, 
prostate cancers have about 50 mutations — 
and each one of them should be read by the 
immune system as an antigen. That is almost 
five times as many potential antigen targets as 
the influenza A virus, which does provoke an 
immune response. It is not just about the num- 
bers, Curran argues. 

When Curran saw how much better check- 
point therapy worked against melanoma 
than prostate cancer, he came up with a new 
approach. He concentrated on the tumour 
microenvironment. Prostate tumours differ 
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from healthy tissue in that they contain high 
levels of testosterone and low levels of oxy- 
gen. “That’s everything T cells hate,” he says. 
Besides which, the tumours are poorly vascu- 
larized — they are a backwater on the circula- 
tory system that the T cells travel. 

In 2011, Curran recalled something he had 
heard in graduate school about drugs that tar- 
get tumour hypoxia, and wondered if that may 
be an avenue to improve the effectiveness of 
immunotherapy. To explore that possibility, he 
began a collaboration with Threshold Pharma- 
ceuticals in South San Francisco, California, 
which makes a drug called evofosfamide. This 
compound circulates in a non-toxic form 
until it reaches a region of low oxygenation, 
which triggers the release of a DNA-damaging 
agent. Curran wondered what would happen 
after the drug had killed tumour cells in the 
hypoxic areas of tumours. Would those areas 
become a wasteland — or would T cells find 
a foothold? Curran found that, in a mouse 
model of prostate cancer, cancer-cell killing 
is followed by a wound-healing response and 
the growth of new blood vessels. That brings 
oxygenated blood and, it seems, a more T-cell 
friendly environment. “T cells can then enter 
the areas they were formerly blocked out of? 
says Curran. After introducing the evofosfa- 
mide, Curran and his colleagues in Houston 
administered checkpoint therapy to prevent 
the arriving T cells from being suppressed. 
Curran reported these results at The Inaugural 
International Cancer Immunotherapy Confer- 
ence this year and is now designing a human 
trial of this combination. 

Hypoxia is not the only environmental bar- 
rier to T cells. Another combination-therapy 
clinical trial addresses the high levels of 
immune-suppressing testosterone in prostate 
tumours by combining hormone therapy (to 
lower testosterone) and checkpoint therapy. 
And it is now becoming evident that some 
chemotherapies that were thought to work 
only by killing cancer cells are dependent 
on the immune system to work. They might 
also be fruitfully combined with checkpoint 
therapy to fight prostate cancer. 

Combination therapy is the great hope of 
prostate-cancer immunologists. There is no 
guarantee that these therapies will avoid the 
cost problems associated with sipuleucel-T. 
But if combining treatments allows more 
men to go into remission — or perhaps even 
be cured — the high price tags may not raise 
as many eyebrows. For immunotherapy, says 
Pandha, combinations are “the final piece of 
the jigsaw.” m 


Katherine Bourzac is a freelance science 
writer in San Francisco, California. 
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PROSTATE CANCER 


4 BIG QUESTIONS 


What causes 
prostate cancer? 


Is PSA testing an 
effective method of 
screening? 


Is it safe to leave 
prostate cancer 
alone? 


Can advanced 
prostate cancer be 
treated? 


Worldwide, prostate 
cancer is the second 
most common cancer 

in men, after lung 
cancer. Identifying a 
preventable cause of this 
disease could reduce the 
number of cases. 


Measuring levels of PSA in the 
blood is often used to detect 
prostate cancer. Without a 


.. reliable test, in some cases, the 


first symptoms of the cancer 
are signs that it has spread to 
the bones, where it is much less 
treatable. 


The most common treatments for 


localized disease — removal of the | 


prostate and radiotherapy — have 


, side effects, such as incontinence | 
“ and sexual dysfunction. Men 


with less-aggressive tumours 
might be better off avoiding 
these procedures. 


Once prostate cancer has 
spread to the lymph nodes 
and bones, the outlook is poor. 


‘. Five-year survival rates for 


metastatic cancer are one-third 
of those for localized disease; 
advanced prostate cancer is 
considered incurable. 


PROSTATE CANCER 


OUTLOOK 


Despite advances in 
detection and therapy, 
much about this common 
malignancy remains 


unknown. Here are some 
of the most important 
unresolved issues. 


BY RICHARD HODSON 


Risk increases with age, and 
inherited factors are estimated 
to be responsible for 5-9% of 
cancers. Risk is five times higher 


~ in men with BRCA2 mutations. 


Despite extensive research, the 


disease has not been clearly linked 


to any preventable risk factors. 


Rates of diagnosis spiked in 
the 1990s in the United States, 
partly because of the use of 


: PSA screening for men without 


symptoms. But itis likely that 
many men were unnecessarily 
treated for cancers that would 
never have caused harm. 


Half of US men with low-risk 
prostate cancer between 2010 
and 2013 had their prostates 
removed, whereas 40% opted 


“to watch and wait. Some studies 


have suggested that low-risk 
patients can be safely monitored 
for more than a decade. 


Therapies for advanced prostate 
cancer have emerged only in 
the past decade. The go-to 


treatment is chemical castration: 
' drugs are used to suppress male 


hormones. This can prolong 
life by two or three years before 
tumours become resistant. 


Taking into account the differing 
rates of prostate-specific antigen 
(PSA) testing in populations could 


. help to firm up links. Arsenic and 


cadmium compounds, anabolic 
steroids and ionizing radiation 
may be causes; carrots and soya 
may reduce the risk. 


The PSA test could be a useful 
procedure if it is applied using 
evidence-based guidelines 


2 (page $123). Combining the 


screen with other analyses, such 
as testing for genetic markers, 
could reduce the number of 
unnecessary treatments. 


The challenge facing active 
surveillance is knowing which 
men have slow-growing tumours 


that can be left, and which are 
" more aggressive. New methods 


for telling aggressive and 
indolent cancer cells apart are 
being investigated. 


Drugs designed to treat 
castration-resistant tumours 
are also facing a resistance 


; problem — 20-40% of patients 


do not respond to these 
therapies, and their efficacy is 
eventually lost in all men (see 
page S128). 


Richard Hodson is supplements editor for Nature. 
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year has passed since we 

published the first Nature Index 

supplement about China, and 
the data accrued in that time reveal 
another remarkable period for science in 
that country. In the past 12 months, the 
growth of China's output in the index has 
dwarfed that of any other nation. 

In this supplement, we analyse three 
years of research output from China — 
from 2012 to 2014 — providing a telling 
snapshot of the country’s emergence as 
a scientific superpower, a phenomenon 
watched with intense interest around the 
globe. The articles in this index focus on 
cities with particularly interesting stories 
to tell. 

Nationally, China’s weighted fractional 
count (WFC) rose 37% between 2012 
and 2014, and growth in this metric 
was notably high in Hangzhou, Xian 
and Chengdu (see ‘At the very heart of 
progress, S179). Further explanation 
of how WFC and other metrics are 
calculated can be found in the guide to 
the index on page S190. 

Measuring output by WFC reinforces 
the status of Beijing, Shanghai and 
Nanjing as the dominant scientific 
centres, which our feature article on page 
S176 explores. 

In light of China's ongoing drive to use 
science and technology to move away 
from its economic reliance on traditional 
manufacturing, we also examined the 
nation’s industry hubs. 
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The Nature Index 2015 China, a supplement to Nature, is 
produced by Nature Publishing Group, a division of Macmillan 
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Nature Index, a website maintained by Nature Publishing 
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The index shows that cutting-edge 
life science has matured quickly in 
Shenzhen, Beijing and Wuhan. In these 
cities front-line science is yielding 
practical outcomes and bringing returns 
that will stoke the fires of the Chinese 
economy. The city of Shenzhen, in 
particular, has experienced a remarkable 
transformation into a research-based 
industry hub and companies based 
there now account for almost half of the 
country’s international patent filings (see 
“The changing face of industry; $184). 

As China cements its role as the world’s 
second largest producer of high-quality 
research papers, it is gaining momentum 
from academic collaborations. Nature 
Index data reveal that Hong Kong, Hefei 
and Tianjin are active in the pursuit of 
international or domestic associations. 
All three recorded a high collaboration 
score, a metric of institutional collabora- 
tion in terms of co-authorship of articles 
in journals covered by the index. 

Each year, the Nature Index presents 
a more comprehensive picture of the 
patterns driving research. China's 
scientific ascension is likely to continue, 
a phenomenon that the index, and 
future China-focused supplements, 
will be well placed to follow. 


Karen McGhee 

Guest editor, Nature Index 
Nicky Phillips 
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BUILDING A POWERHOUSE 


r | “lhe story of China’s phenomenal growth 
in scientific output during the past 
three years can be told through the 

experience of eleven cities. Each has displayed 


impressive output, measurable in one way or 
another, as determined by analysis of Nature 


represented in red, as the nation’s industrial 
research powerhouses; where high scientific 
output is being used to generate economic 
return. Meanwhile, the cities most actively 
pursuing partnerships to advance scientific 
discoveries are Hefei, Tianjin and Hong Kong, 


Index data from 2012 to 2014. 

Four index metrics have been used to evaluate 
the performances of China’ cities: article count 
(AC); fractional count (FC); collaboration score 
and weighted fractional count (WFC). (For a 
full explanation of these metrics see $190.) 

Represented here in yellow are China's 
scientific heavyweights Beijing, Shanghai 
and Nanjing: the cities that have shown the 
highest total output. The index data also reveal 
some of the cities where total scientific output 
has been growing fastest — China's rising 
stars — are Xian, Chengdu and Hangzhou, 
shown in orange. Delving deeper into index 
data identifies Shenzhen, Beijing and Wuhan, 


in purple. = 


Data analysis by Larissa Kogleck 
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TOTAL ESTIMATED 
POPULATION OF CHINA, 
THE WORLD'S MOST 

POPULOUS COUNTRY. 
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These 11 cities 
generated 


Id 


of China's overall 
articles in the significant growth in 
Nature Index their contribution to 
in 2014. > the Nature Index. 


RISING STARS XVAN 


Xi'an, Chengdu and 
Hangzhou experienced 
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Circle size is relative to WFC 


in 2014. 
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scientific ascension. 


TIANJIN 
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HEFEI 

>2014 WFC: 255 
>2013 WFC: 212 
>2012 WFC: 179 


WUHAN 

>2014 WFC: 257 
>2013 WFC: 217 
>2012 WFC: 192 


HANGZHOU 
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CHINA 


China’s many hopeful and determined graduates take their place ina rich and varied research landscape that is transforming the country’s fortunes. 


THE RAPID RISE OF A RESEARCH NATION 


China’s economic boomis mirrored by its similarly meteoric rise in high-quality science. 


BY YINGYING ZHOU 


hina has ambitious plans to source 
( as much as 15% of its energy from 

renewable sources by 2020, at the 
same time its economy is projected to slow. It 
also aspires to be the next space superpower 
while facing major health and environment 
challenges, such as an ageing population and 
water shortages. 

The Chinese government knows that sur- 
mounting these challenges while achieving 
its goals can only be accomplished through 
science. Indeed, China is pegging its future 
prosperity on a knowledge-based economy, 
underpinned by research and innovation. For 
a country that invented paper, gunpowder 
and the compass, such lofty ambitions could 
be realized. This year pharmacologist Tu 
Youyou became the first Chinese researcher 
to be awarded a Nobel Prize in Medicine for 
helping discover a new drug for malaria that 
has saved millions of lives. 

“With a solid base built upon the large 
quantity of research, China is [about] to take 
off in world-leading innovations and scientific 
breakthroughs,” says He Fuchu, the founding 
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president of PHOENIX, the Chinese National 
Center for Protein Sciences. “High-quality 
research is built upon the accumulation of 
incremental advances,’ He says. 

The Nature Index shows China is already a 
high-quality scientific powerhouse. Since the 
first Nature Index database started in 2012, 
China's total contribution has risen to become 
the second largest in the world, surpassed only 
by the United States. 

But, what sets China apart is the rapid 
growth of its WFC. While China’s contribution 
grew 37% from 2012 to 2014, the United States 
saw a 4% drop over the same period. 


AN ECONOMIC IMPERATIVE 
A key driver of China's scientific progress is its 
burgeoning economy, dazzling the world since 
embarking in the early 1980s on a transforma- 
tion from a centrally planned to market-based 
economy. In the past three decades, China has 
achieved a consistently impressive annual 
average GDP growth rate of around 10%, and 
has overtaken Japan to become the world’s 
second largest national economy behind the 
United States. 

“The economic success has fuelled the 


nation’s investment in science and technol- 
ogy,” says Liu Zhu, a researcher at Harvard 
University’s Kennedy School of Government, 
who is also currently a fellow in sustainability 
science at the California Institute of Technol- 
ogy. While China’s unfettered growth cannot 
last forever — economic growth has slowed, 
with the GDP growth rate falling to 7.4% last 
year, its lowest in 25 years — it has been the 
subject of global awe and fascination. 

Figures from the National Bureau of Statis- 
tics of China show that during the past decade 
the nation’s total research and development 
(R&D) expenditure also blossomed, achieving 
an average growth rate of more than 20%. In 
2014, R&D expenditure totalled 1,330 billion 
yuan, equivalent to 2.1% of the national GDP. 
China’s rapid growth trajectory in R&D invest- 
ments is in sharp contrast to the constraints 
placed on R&D budgets of the United States, 
Japan, and most European countries, still 
recovering from the global economic crisis 
that began in 2007. 

Recognizing the importance of scientific 
research in driving technological innova- 
tion and economic progress in 2006, the 
Chinese government unveiled its National 


CHINA BOOMING 


This cross-section of countries in the Nature Index shows how remarkable China's 


CHINA 


CHEMISTRY CHAMPS 


increase in high-quality science output has been in recent years. 


Change in output of the top five leading countries in chemistry. 
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Medium- and Long-Term Plan for the Devel- 
opment of Science and Technology, setting out 
a path to transform the country into a “science 
powerhouse” by 2020. The 15-year plan called 
upon an “indigenous innovation” campaign, 
putting science and technology development 
at the centre of the national development 
strategy. Under the strategy, investment in 
higher education was emphasized, recogniz- 
ing that human resources are at the heart of 
scientific development. 

China has made great efforts to expand its 
higher education system and enlarge its scien- 
tific workforce. The number of PhD graduates 
in science and engineering has soared in the 
past decade along with the number of gradu- 
ates with bachelor degrees. Central and local 
government efforts to attract Chinese-born 
scientists to return 
from overseas work 


“China has the acdc i 1 
chance to bea oe 
eg paid off. The promi- 

research giant 1.000 Tal 

destablish oe eee 
me : Plan initiated in 
pele Me 2008 by the Cen- 
culture of » tral Organization 
innovation. 


Department of the 
Chinese Communist 
Party has hugely exceeded its eponymous goal, 
having attracted more than 4,180 top-level sci- 
entists from abroad by mid-2014. The cumula- 
tive number of returned PhD holders reached 
110,000 in 2014. 

“The improvement of the research capabili- 
ties of Chinese researchers and the returning 


of foreign-trained Chinese scientists from 
overseas certainly adds to the momentum 
of China's scientific growth,’ says Wang Jun, 
the former chief executive officer, and now 
partner, of successful genomics sequencing 
company BGI. David Reiner, a senior lecturer 
in science and technology policy from Cam- 
bridge University’s business school and a keen 
observer of China, says the return of its large 
scientific diaspora also fosters a supportive 
research culture. 

With a deep cultural reverence for educa- 
tion, the Chinese hold scientists and their 
research in high regard. “The government 
investment and promotion, combined with 


Wang Jun says returning scientists help growth. 


UNITED 
KINGDOM 


2013 2014 


the determination of Chinese scientists and 
societal support have fostered a culture of 
innovation,” says Wang. “China now has the 
opportunity not just to be seen as a global 
research giant but also to establish a long- 
term culture of innovation that will undoubt- 
edly lead to myriad scientific discoveries in 
decades to come.” 


BOLSTERING QUANTITY WITH QUALITY 

The research assessment system has also 
played a role in China’s rapid growth in out- 
put. An increasing focus on evaluation systems 
that measure quality is shifting emphasis from 
quantity-driven metrics. Most universities and 
research institutions now evaluate researchers 
based on the number of publications in high- 
impact journals rather than the sheer volume 
of publications, according to Ren Xiaobing, 
chairman of Xian Jiaotong University’s Fron- 
tier Institute of Science and Technology. 

But Reiner warns that the increased pres- 
sure on researchers to focus solely on pub- 
lishing their work in academic journals could 
encourage academic fraud. “The downside of 
an exclusive focus on quantitative metrics is 
that they may blind what is really important 
to research, making it difficult for scientists to 
explore blue-sky research ideas,” he says. 

Some major research institutions, such as 
the Chinese Academy of Sciences (CAS) and 
prestigious universities, such as Xian Jiaotong 
University, are beginning to include other eval- 
uations in their researcher assessments. “Other 
than the criterion of high- impact publications, 
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we have also adopted more nuanced and indi- 
CHINA’S SUPERSTAR vidual-focused assessment criteria that are 
based on a researcher’s relative performance 


yeh compared to his or her peers,’ says Ren. 


DISCIPLINARY STRENGTHS 

China’s booming scientific output is con- 
centrated around specific subject areas, a 
trend that has continued since 2012. Chem- 
istry and physical sciences clearly dominate 
the country’s total publishing output in the 
Nature Index (see ‘Chemistry champs’). The 
WEC figure for chemistry in 2014 was 3,783, 
accounting for 61% of the country’s total WFC, 
while physical sciences made up 30% of China's 
publishing output in the index. By comparison, 
distribution of the WFC in other subject areas 
are represented more proportionally in other 
top contributing countries, such as the United 
States, Germany and the United Kingdom. 

The Chinese Academy of Sciences (CAS) 
is the top institutional producer in chemistry 
WFC, both in China and around the globe (see 
‘China’s superstar’). The Institute of Chemistry 
(ICCAS) is the top contributing CAS institute 
by WEC. Its research strengths lie in molecu- 
lar and nanosciences, organic and polymeric 
materials, chemical biology, as well as energy 
and green chemistry. 

These cutting-edge areas of chemistry tend 
to have an applied aspect and are essential for 
industrial innovation. For instance, an ICCAS 
researchers’ study on the assembly mecha- 
nism of organic composite materials strongly 
contributed to the development of flexible 


Acongrtess of the Chinese Academy of Sciences, a central plank of the country’s research prowess. 


The Chinese Academy of Sciences (CAS) is 
the world’s largest institutional contributor 

to the Nature Index. In 2014 its WFC was 
1,308 (its AC was 3,124), significantly higher 
than that of the second-ranked institution, 
Harvard University, with a WFC of 865. By 
subject areas, CAS leads not only in chemistry, 
but also in physical sciences, and earth and 


environmental sciences, with higher WFCs 

in these major subject areas than any other 
research institutions worldwide. CAS employs 
more than 60,000 people, has 104 research 
institutes and has been central to China’s 
modern scientific development. Its 2015 
budget was 54 billion yuan (US$8.4 billion), 

a 9% increase from 2014. m 
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FROM WITHIN 


Chinese policy is to encourage international 
cooperation in scientific research. A large 
diaspora of Chinese-heritage scientists 
around the world, particularly in the United 
States, have forged bonds between 
researchers in China and elsewhere. 


UNITED 
KINGDOM 


The United 
Kingdom is 
leading 
Europe's 
growing 
engagement 
with Chinese 
scientists. 


SAUDI ARABIA 
China's 
collaboration 
with Saudi 
Arabia has 
grown more 
than any other 
partnership in 
the Middle East 
or Africa. 


photonics and the realization of nanophotonic 
circuits for next-generation optical informa- 
tion processing. 

The Chinese government has a crucial role 
directing the country’s science and technology 
development. “The government’s emphasis 
on the commercialization of high technology 
and the capacity of scientific research to drive 
industrial productivity possibly explains the 
strong focus on chemistry, particularly the 
subfields that are easily translated into com- 
mercial production,’ says Liu. 

In line with the demands of the national 
development strategy, China is also making 
efforts to innovate in relatively newer fields in 
life sciences and envi- 
ronmental sciences 


“The nature of aie nea 
the scientific pecan emanse:  Ubades 
revolitionis resources, agriculture, 
I ion : environmental pro- 
ane Raabe tection, and human 
Chinawilllead —j, eatth. which were 
more and more 


identified as research 
priorities in 2006 in 
China's 15-year plan. 
“Strong demand for new energy and the need 
to reduce pollutants emission in energy con- 
sumption will drive China’s growth in envi- 
ronmental sciences,” anticipates Liu, whose 
background is in this field. 

Life sciences are also expected to make great 
advances in the near future. Between 2012 and 
2014, China’s output in this area grew by 30%. 
Fields such as genomics and protein sciences, 
stem cell and cloning technology, and gene 


programmes. ” 


therapy have already experienced significant 
progress. “[China is] set to become the global 
powerhouse of gene and protein research, 
leading this exciting field in life sciences and 
making grand discoveries with profound 
impacts,” He explains. 


INCREASED INTERNATIONAL COLLABORATION 

Collaboration is an increasingly significant 
aspect of modern science and China’s collabo- 
ration scores in the index reflect this trend. 
The recent Nature Index 2015 Collaborations 
supplement revealed that China’s international 
partnerships are soaring, with its collaboration 
score rising 31% from 2012 to 2014. Collabora- 
tion score is the sum of the fractional counts 


Winner of the Nobel Prize for Medicine, Tu Youyou. 


AUSTRALIA 


China's collaborations with 
its regional neighbour 
have markedly increased 
compared to other 
Asia-Pacific countries. 


Due to Taiwan’s distinct research funding 
and management, the graphic does not 
include its collaboration data. 


(FC) for each of China’s bilateral partnerships. 
Almost half of China's international collabo- 
ration score in 2014 came from partnerships 
with the United States. Correspondingly, 
China has become the United States’ largest 
collaborator, surpassing Germany in 2014. 
Other important international collaborators 
for China are other top contributing countries 
to the Nature Index, such as the United King- 
dom and Japan (see ‘Collaboration hotspots’). 

With the return of many Chinese scientists 
who have trained abroad, international col- 
laborations are often based on personal ties. 
This has raised questions about China’s role 
in these collaborations, some suggesting it is 
merely providing cheap labour working on 
ideas at the behest of former supervisors. 

But in the past five years, Reiner says 
there has been a shift in these partnerships 
as Chinese scientists play a more significant 
role in the research and Chinese institutes 
contribute a greater proportion of the funding. 
In some international collaborations, for 
instance the Human Liver Proteome Project 
(HLPP) led by He, China is already setting 
scientific objectives and making important 
theoretical and technological innovations. 
“The nature of the scientific revolution is 
long-term,” Reiner says. “As the output of 
scientific research from China is growing, I 
have no doubt that China will be leading more 
and more major international programmes, 
providing more important and, in some 
cases, critical contributions to international 
collaboration proportional to its input? m 
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Alion guards Beijing’s Forbidden City as a symbol of strength. China’s capital is itself a stronghold of scientific achievement, along with Shanghai and Nanjing. 
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THREE GIANTS TIGHTEN THEIR GRIP 


The benefits of economics and history converge with the demands of population growth 
and sustainability issues in China’s most productive research and technology centres. 


BY PENG TIAN 


forming Chinese cities in the Nature Index 

are Beijing, Shanghai and Nanjing. All three 
are significant players economically and politi- 
cally, Beijing and Shanghai particularly. 

Atall levels of Chinese government, officials 
see innovation through science and technology 
as critical for the nation to achieve continued 
economic growth on a more environmentally 
sustainable path. The central government has 
invested heavily in science and technology to 
improve productivity and upgrade some indus- 
tries, such as the manufacture of advanced 
high-speed trains. 


I: will come as no surprise that the top per- 
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This, in turn, has reinforced the status of 
Beijing, Shanghai and Nanjing, which represent 
the Chinese cities with the top 2014 WFCs. 
(see ‘China's top 10’). 

The local governments of advanced prov- 
inces also push technological innovation to 
achieve economic ascendancy over other 
cities. Market forces are playing an increas- 

ingly significant role in 


“Along period Beijing, Shanghai and 
of economic Nanjing, as both inter- 
growth has national and domestic 
beenmirrored commercial enterprises 
by committed — workwith universities to 
investment.” develop next-generation 


technologies. 

Crucially, China’s research system was 
significantly rebuilt after the turmoil of the 
Cultural Revolution — the social and political 
movement that began in 1966. 

A long period of economic growth has been 
mirrored by committed investment in science 
and technology. Besides the National Natural 
Science Foundation of China (NSFC), there 
are several ongoing programmes that promote 
basic research and technological innovations in 
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universities and institutes. These have included 
the 863 and 973 Programmes under the Min- 
istry of Science and Technology (MOST), and 
Projects 211 and 985 under the Ministry of 
Education. Through these initiatives elite uni- 
versities and institutes in Beijing, Shanghai and 
Nanjing have received enormous funding sup- 
port to build advanced research facilities and 
improve research quality. 


BEIJING 
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Beijing, the nation’s capital, has been the centre 
of power for China for millennia, and a research 
and industry stronghold since the foundation 
of the People’s Republic of China in 1949. 

Beijing benefits most from the systems of 
resource allocation established after 1949 
when the communist government restructured 
and relocated the country’s main education 
and research centres. 

It has the highest number of institutions in 
the Nature Index — 131 in 2014 — of the three 


CHINA'S TOP 10 


The country's most productive cities in the Nature Index by WFC in 2014. 


wre 
HB Ac 


1 BEIJING 


2 SHANGHAI 


3 NANJING 


4 WUHAN 


3 HEFEI 


6 CHANGCHUN 


THONG KONG 


8 HANGZHOU 


9 GUANGZHOU 


10 TIANJIN 


0) 1,000 2,000 


Lh 


The proportion of China's 
AC contributed by the 
top 10 cities. 


3,000 
WFC 


4,000 5,000 


/ () | y The proportion of China’s WFC contributed by the top 10 cities. 
+ /0 


cities and is particularly strong in chemistry 
and physical sciences. This contributes greatly 
to Beijing’s overall output in the Nature Index. 
Beijing has also inherited some of the most 
prestigious universities established prior to 
1949. These include China’s leading Peking 
University (PKU) and Tsinghua University, 
which receive resources from central and local 

governments. Several 


“PKU’s special eee eer 
position in China ee . : 7 
has played akey enamel 
role in helping of Sciences (CAS), 
us get support such as the Institute 
fromthe central of Chemistry and 
"4 Institute of Phys- 
government. ics, are also located 


in Beijing. Between 
them, Beijing's top 10 institutions accounted 
for around 60% of the city’s overall 2014 WFC 
(see ‘Cities of influence). 

The vast resources allocated to these uni- 
versities have paved the way for some ground- 
breaking research, including an important 
development in quantum computing by Duan 
Luming from Tsinghua University. In a paper 


published in Nature, he and colleagues described 
experiments that bring robust quantum compu- 
tation at room temperature closer to reality. 
Generous funding has also helped physicist 
Peng Lianmao at Peking University, who is 
developing technology that can build carbon 
nanotube semiconductor devices and integrated 
circuits. In a paper published in Applied Physics 
Letters, his team detailed the construction of 


Wang Yonggang focuses on new lithium batteries. 
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CITIES OF INFLUENCE 


Beijing hosts the most institutions, while 
Shanghai and Nanjing's WFC comes largely 
from their top 10 performers. 
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high-performance carbon nanotube transistors 
and integrated circuits, which represents the 
future of computer processors. This work 
received funds from the MOST, NSFC, and 
the Beijing Municipal Science and Technology 
Commission. 

“The research work of constructing nano 
devices and integrated circuits needs huge 
funding,” Peng says. Since 2011, his lab has 
received about 70 million RMB from Project 
973. “PKU’s special position in China has 
played a key role in helping us to get support 
from the central government.” 

Now, he says, the lab is the only one in 
the world that can construct 10-nm carbon 
nanotube complementary metal-oxide-semi- 
conductor (CMOS) integrated circuits. In 2012, 
the application potential of this attracted local 
government money from the Beijing Municipal 
Science and Technology Commission. “Beijing 
Municipality is helping us to make some long- 
term development strategies for carbon-based 
integrated circuits,’ Peng explains. He is opti- 
mistic that a clear strategy and further funding 
will come through at the national level to take 
the technology even further. 
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The manufacture of advanced high-speed trains has attracted significant government funding. 


SHANGHAI 
© WFC rank China: 2 
OY AC: 1,955 


Inan idyllic location on the estuary of the great 
Yangtze River in the centre of East China’s 
Yangtze River Delta, Shanghai has become 
a world-renowned port and global financial 
hub since it opened to international trade in 
the 1840s. Shanghai, which has been one of the 
world’s major manufacturing centres for more 
than two decades, receives generous research 
resources from the regional economy. 

With a 2014 population of more than 
24 million, Shanghai is now the largest and 
most populous city in China. It has the prestig- 
ious Fudan University as well as many research 
institutes of the CAS and other universities 
built after 1949. 

Shanghai has less than half the total number 
of institutions of Beijing in the index — 52 in 
2014 — but its top 10 produce a similar output 
to the capital's top 10. Shanghai’s strength lies 
in chemistry (see ‘Shanghai’s best game’). More 
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than 60% of the city’s output in the Nature 
Index is in chemistry, and Fudan University 
and the CAS Shanghai Institute of Organic 
Chemistry are the biggest contributors. 

“Besides the promotion from the state and 
Shanghai, international corporations, giant 
state-owned enterprises, and private enter- 
prises also actively collaborate with Shanghai's 
universities and research institutes to develop 
new technologies,’ says Wang Yonggang, a 
materials chemist and associate professor in the 
Department of Chemistry at Fudan University. 
Between 20 and 30% of his lab’s funding comes 
from such collaborations. 

Wang's research focuses on new types of 
lithium battery. He's already had some promis- 
ing results published in Chemical Communica- 
tions, which show that these devices have the 
potential to be next-generation batteries for 
everything from electric cars to smartphones. 

Wang is a partner of the Collaborative Inno- 
vation Center of Chemistry for Energy Materi- 
als (iChEM), a project of Plan 2011. “The centre 
can coordinate experts from different research 
fields, make communication more efficient, 
and has the capacity for commercializing the 
achievements’, Wang explains. 


NANJING 
© WEC rank China: 3 
OY AC: 1,064 


As the capital of the wealthy eastern coastal prov- 
ince of Jiangsu, Nanjing is located in a region rich 
in economic and technological activity. 
Nanjing means ‘southern capital’ in Chinese 
— an indication of its status — and is the sec- 
ond largest city after Shanghai in the prosperous 
Yangtze River Delta. In recent years, Nanjing’s 
largest growth across all the subject areas has, 
as with Shanghai, been in chemistry. Also like 
Shanghai, almost 60% of its output in the Nature 
Index is in chemistry. In an effort to differenti- 
ate itself from its regional counterpart, grants 
to promote materials science and astrophysics 
have been offered to research groups in Nanjing. 
Nanjing University is the main source of 
the city’s scientific discovery and technologi- 
cal innovations and the city’s main contributor 
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LEADING THE PACK 


Nanjing’s overall growth is driven by the 
performance of a single institution. 
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ce 
accounting for more We always 
than half of Nanjing’s try to extend 
overall 2014 output the limits of 
(see ‘Leading the knowledge. 
pack’). The Collabora- Now we want 
tiveInnovation Center to translate the 


of Advanced Micro- breakthroughs.” 
structures (CICAM), 

which is part of Plan 2011, generates most of 
Nanjing’s research on artificial microstructure 
materials. As well as basic research, this centre 
also tries to translate research findings into 
practical applications to meet the technologi- 
cal needs of the delta’s industries. 

CICAM acoustic physicist Bin Liang, a 
professor at Nanjing University, has designed 
and experimentally realized a new acoustic 
absorption material that may be used for noise 
reduction and to make echo-free underwater 
materials. His research was recently published 
in Applied Physics Letters. 

Not so long ago the acoustic qualities of this 
new material were thought to be impossible 
to achieve, Bin says. “We always think about 
trying to extend the limits of knowledge,” 
he explains. “Now, we want to translate the 
breakthroughs into applications.” m 
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Chengdu’s sparkling and fast-growing skyline is a shining testament to its transformation into a high-tech hub, which is home to almost 100 cutting-edge labs. 
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AT THE VERY HEART OF PROGRESS 


The ambition driving China’s astonishing progress in the output of high-quality science 
is particularly strong in some cities, whose growth far outstrips expectation. 


BY SARAH O’MEARA 


hen neuroscientist Anna Wang 
Roe was looking two years ago for 
somewhere to set up a new state- 


of-the-art interdisciplinary neuroscience and 
technology institute, her search ended in the 
far eastern city of Hangzhou. “Ilooked at many 
top universities. But then I went to Zhejiang 
University [in Hangzhou] and it really stood 
out, even over some higher ranked Chinese 
institutions,” she recalls. 

Between 2012 and 2014, China’s Nature 
Index WFC rose by 37%. But several cities 
grew at an even faster rate — Hangzhou, Xian 
and Chengdu being some of the standout 


XTANe 


CHENGDU — HANGZHOU© 


examples (see ‘Stellar performers’). 

The overall growth in publication output 
from these cities has largely been in chemis- 
try (see ‘Subject specialities’). Yet researchers 
working in these universities point to many 
factors, beyond expertise in a single discipline, 
for their success. Wang Roe, for instance, was 
so impressed by Hangzhou’s atmosphere of 
energy, passion and collaboration that she 
approached the city’s Zhejiang University, 
China's fifth largest Nature Index contribu- 
tor in 2014, with her institute proposal. “At 
Zhejiang they have created a highly motivated 
research environment where people can share 
and explore freely. This doesn't always happen 
in China,’ she notes. 


XIAN 
(3 WFC rank China: 12 
03 AC: 309 


Situated in China’s northwestern Shanxi proy- 
ince, Xian shows the largest relative increase 
in WFC of the three cities, increasing by 142% 
from 2012 to 2014 (see ‘Stellar performers’). 
WEC is a metric that apportions credit for each 
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article according to the affiliations of the con- 
tributing authors. 

Xian has a history of at least 3,000 years, 
more than a thousand of those as the capital 
of ancient Chinese dynasties. Since the early 
1990s, when it emerged as the lead city of Chi- 
nas Western Development programme, Xian 
has promoted the value of tech-driven indus- 
try and established one of the earliest national 
high-tech development zones. The city is 
also home to a sophisticated national avia- 
tion base that was integral in manufacturing 
Shenzhou 6, the craft that carried China’s sec- 
ond manned space flight in 2005. 

Between 2012 and 2014, the fast rise in the 
number of publications from Xian institutions 
featured in the Nature Index’s 68 high-impact 
journals was led by Xi'an Jiaotong University. 
The largest subject increases were in chemistry 
and physical sciences. 

Shan Zhiwei is deputy director of the uni- 
versity’s State Key Laboratory for Mechanical 
Behavior of Materials, which contributed to 
almost 25% of Xian Jiaotong University’s arti- 
cles in the Nature Index in the three-year growth 
spurt. He believes the university’s success is 
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STELLAR PERFORMERS 


The proportional increase of China's top 20 growing cities compared to their overall increase in WFC over three consecutive years between 2012 and 2014. 
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RISING STAR 


Xi'an experienced an exceptional 
rise in the relative increase of its 
WFC, and it is also among the top 
five in terms of its absolute 
increase over that time. 


STEADY RISERS 


Chengdu and Hangzhou 
experienced a relative and 
absolute increase in their WFC. 
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THE LEADERS 


Cities such as Beijing, Shanghai and 
Nanjing experienced the highest absolute 
increase in WFC. 
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While Xi'an, Chengdu and Hangshou experienced exceptional overall growth, 
it has been driven mostly by chemistry and physical sciences. 
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360 


driven by government strategies to attract inter- 
national science talent to China, combined with 
innovative recruitment policies. “Xi’an Jiaotong 

has developed a series 


“An emphasis of local policies to 
on talent attract talents, includ- 
recruitment ing offering attractive 
has been salaries, strong finan- 
accompanied cial support, and an 
byawillingness °P® and approacha- 
to embrace ble ee sta 

explains Shan, who is 
Se aebing f also a director of the 
al Center for Advancing 

Materials Performance 


from the Nanoscale (CAMP-nano), 

He adds that in the past five years, Xian Jiao- 
tong has hired many foreign-born experts and 
enticed hundreds of Chinese-born profession- 
als working abroad to return, bringing their 
education and experience. The recruitment 
drive has been accompanied by a willingness 
to embrace new approaches to work. “We drew 
on the strength of the new people,’ Shan says. 
“They have introduced new methods of train- 
ing students, managing team members and 
handling instruments. In addition, the recruits 
have strong financial support. This combina- 
tion has led to high efficiency of work and 
yielded good research outputs.” 


CHENGDU 
€3 WEC rank China: 13 
O9.AC: 287 


Science and technology-driven development 
has transformed Chengdu into one of the 
world’s fastest growing cities and led to a surge 
in high-quality research output. The WFC 
of Chengdu in the Nature Index increased 
by almost 80% between 2012 and 2014 (see 
‘Stellar performers’). 

The city has allocated enormous resources 
to create an environment where innovation 
thrives, starting in the laboratory. Chengdu 
now has 10 national key laboratories funded 
by China’s central government, 30 labs 
established by branches of local government 
and 53 universities. 

Between 2012 and 2014, the Nature Index 
contribution of the city’s Sichuan University 
soared, particularly in chemistry and related 
disciplines. “Chengdu has become a hotbed for 
academic achievement,’ says Wei Yuquan, vice 
president of Sichuan University and director 
of the National Key Laboratory of Biotherapy, 
which contributed to articles in the Nature 
Index between 2012 and 2014. “Our multi- 
disciplinary research centre has established an 
integrated technology chain for the discovery 
and development of innovative drug candidates 
in a single institute,’ he says. 

Chengdu is the capital of the landlocked 
Sichuan province and has been growing rapidly 
since 2000, when China's central government 
began pouring money into poorer interior 


Xi’an’s sophisticated aviation base was central to the manufacture of the spacecraft Shenzhou 6. 


cities. Official policy has focused on turning 
the city into a high-tech hub, a role Chengdu 
was well positioned to fill. “Local government is 
establishing Chengdu as an area of innovation, 
and has set up many foundations to support 
research projects including basic research and 
to recruit talented researchers,’ Wei says. 

Among them is Gong Qiyong, deputy 
dean of West China Medical School, Sichuan 
University in Chengdu. Ten years ago he 
resigned from a faculty position at the 
University of Liverpool in England to become 
the director of West China Hospital’s Huaxi 
MR Research Center. “Now my group has 
been internationally recognized as one of the 
leading teams in psychiatric imaging,” he says. 
“Recently we obtained a huge grant from the 
government to build the National Research 
Center of Translational Medicine” 


HANGZHOU 
e3 WEC rank China: 8 
OY AC: 458 


Hangzhou is one of China’s quintessential 
historic cities, but its rapidly increasing rate of 
high-impact research suggests its best years are 
yet to come. Hangzhou, where the Nature Index 
WEC jumped by 55% from 2012 to 2014, has 
the largest absolute WFC of the three highest 

performing cities (see ‘Stellar performers’). 
During the past decade, Hangzhou has 
become a hub for science-driven innovation 
from laboratory research to tech start-ups. It’s 
home to the Alibaba Group, China’s leading 
e-commerce service pro- 


“ , vider with an estimated 
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promote technology transfer from the lab into 
business, and start-up funds for academics set- 
ting up companies at incubator sites. 

Wang Yong is a professor at Zhejiang Univer- 
sity’s School of Materials Science and Engineer- 
ing, which hosts a nationally-funded laboratory 
for silicon materials, and contributed to papers 
from the university in the Nature Index between 
2012 and 2014. Wang is researching the cata- 
lytic mechanism of nanocrystals at nanoscale 
in order to develop high performance catalysts 
for future industry use. “The local government 
is rich and invests a large amount of money in 
universities,” he explains. “There is good start- 
up funding available, financial incentives for 
high-quality work, state-of-the-art facilities, 
and the opportunity to innovate and collabo- 
rate with industry” 

Hangzhou also benefits from leaders whose 
clear vision was a big factor behind Wang Roe’s 
decision two years ago to ask Zhejiang Uni- 
versity for a US$25 million grant to build her 
Interdisciplinary Institute of Neuroscience and 
Technology, of which she is now the director. 

“The administration tells me the grant was 
the largest investment of any university in China 
ona single project; Wang Roe says. 

The five-storey building with 20 labs and 
a large primate facility officially opened in 
October 2015. Academics came from all 
over the world for the opening ceremony and 
conference. “[They] were very impressed [and] 
amazed that this could be achieved in sucha 
short time; Wang Roe says. “I looked at many 
high-level institutions in China for this project, 
but Zhejiang had it all; excellent engineering, 
optics, materials science, information 
sciences, neuroscience and medicine, and a 
collaborative environment. 

“There’s huge energy in Hangzhou. Once the 
administration decides on something, they go 
for it with full force with the long-term in mind. 
It makes you think differently, it really does.” = 
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The flow of the Shenzhen River mirrors the extraordinary dynamism of the city it passes, home to innovative manufacturers and a commercial transformation. 


THE CHANGING FACE OF INDUSTRY 


Amid fierce international competitiveness, governments at all levels are responding 
by orchestrating collaborations between industry and academic institutions. 


BY DAVID CYRANOSKI 


innovation and the economic benefit of 
bringing ideas to the market. But China's 

drive to embrace an innovation-based economy 
in favour of its reliance on manufacturing is 
daunting. Most big companies are state owned 
and traditionally averse to funding research. 
Despite a dramatic increase in basic research 
output over the past two decades, only a small 
percentage is converted to industrial application. 
Yet as Shenzhen, Beijing and Wuhan show, 
an industrial base built on cutting-edge science 
has matured quickly in some regions, often with 
local policy support. These cities host many 


L ike many nations, China is hotly pursuing 


BEIJING 


WUHAN 


SHENZEN © 
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of the Chinese corporations with the highest 
growth in research output published in the 
Nature Index’s 68 high-impact journals between 
2012 and 2014 (see ‘Industry champions’). With 
a swag of intellectual property and revenue as 
proof, these regions are leading China in its 
quest for transformation. 


SHENZHEN 
1: 21 
165 


Shenzhen, in the country’s southeast, has had the 
most marked transformation to a research-based 
industry hub of any city in China, and probably 
the world. Just 35 years ago it was a fishing vil- 
lage; now it’s a thriving metropolis that links 
Hong Kong with China's mainland. Companies 
based there account for almost half of the coun- 
try’s international patent filings. 

Most of these filings are in telecommunica- 
tions and electronics, with Huawei and ZTE 
leading the way. These two ICT multination- 
als, along with Shenzhen-based rechargeable 
battery-maker BYD Co., Ltd, boast China’s three 
biggest patent portfolios. 
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Shenzhen is also home to the Kuang-chi 
Institute of Advanced Technology, a manu- 
facturer of radar absorbent materials used in 
stealth technology. It was founded by a group 
of Chinese scientists returning from stints 
working in the United States. 

Casting a broader net, the Shenzhen 
Institutes of Advanced Technology, one of the 
most industrially prolific units of the Chinese 
Academy of Sciences (CAS), has established 
collaborations with more than 150 companies 
during its 10-year history. 

The genomics sequencing powerhouse 
BGI, in particular, has successfully melded 
basic research in Shenzhen with commercial 
operations. It’s one of China’s biggest contrib- 
utors to high-impact scientific publications, 
holds some 400 patents and has another 300 
pending. More than half are related to genes, 
especially in the areas of agriculture and 
rare human diseases. Another third relate to 
sequencing technology, and the rest are special 
applications. BGI also holds the crown for the 
Chinese company with the highest 2014 WFC. 
Indeed, it is in the world’s top 20 corporate 
contributors to the Nature Index. 


CHINA'S TOP 10 


While half of China's leading companies in the index are located in 
Beijing, the highest producers of high-quality science are in other 


cities. Circles sized by WFC in 2014. 


INDUSTRY CHAMPIONS 


Among the Chinese cities that host 
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Funding support for greener production improves conditions for car and battery manufacturer, BYD. 


The success of BGI and other Shenzhen com- 
panies, says executive director of BGI Research, 
Xu Xun, has been bolstered by the city’s history 
as China’ first special economic zone in 1980, 
an initiative that made it easier to start a com- 
pany and interact with foreign companies. “In 
other cities, many companies are founded by or 
supported by the government. Patents are not 
important for them,’ says Xu. “Here, there are 
a lot of private companies and they need intel- 
lectual property so we put a lot of effort into it” 

BGl is presently celebrating the first harvest 
ofa drought-tolerant millet strain that was bred 
on the strength of discoveries from a sequenc- 
ing project at the company. BGI hopes this 
new millet will find a large market in a China 
where there are increasing concerns about water 
resources. BGI will also expand into the clinical 
sequencing market with a new line of desktop 
sequencers, trying to take advantage of China's 
move towards personalized medicine. 

The Shenzhen government uses the number 
of patents as one measure of a company’s value 
to the city, with companies deemed significant 
enjoying benefits such as special fast-track 
approval processes. It also gives annual awards 
for the most impressive innovations. “We get 
pressure to have good intellectual property both 
from the government and from our needs as a 
private company,” says Xu. “Shenzhen always 
finds ways to support innovation.” 


BEIJING 
© WFC rank in China: 1 
OY AC: 5,163 


While Shenzhen might have flexible rules and 
an entrepreneurial environment, Beijing has its 
own advantages to make it a bustling and inno- 
vative industrial centre, says Jin Qinxian, head 
of the technology transfer office at Tsinghua 
University. The city’s numerous universities — 
most notably Tsinghua and Peking University 
— anda slew of institutes either independent 
or affiliated with the CAS have provided fertile 
ground for technology transfer. “The history, 
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the culture, the number of institutes and uni- 
versities make Beijing very powerful,’ says Jin. 
Beijing also has a number of highly talented, 
internationally renowned researchers who have 
returned from working or studying abroad. 
Among their number is Wang Xiaodong, a 
former Howard Hughes researcher at the Uni- 
versity of Texas Southwestern, in the United 
States, who designed and now directs the 
National Institute of Biological Sciences in 
Beijing. He also founded BeiGene, a company 
with several large 
and small molecule 


“There is 

pressurefor cance ween 
good intellectual and which received 
property from US$97 million in 
the government financing this spring. 
and from our It has already part- 


company needs.” nered with Merck 


Serono on two drugs. 
Beijing also has a sequencing company, Novo- 
gene, run by an ex-BGI employee, and now 
competing with BGI. Novogene rounds out the 
top 10 Beijing companies with the highest 2014 
WEC (see ‘China’ top 10’). 

Tsinghua, which ranks first for research 
funding in China, has a particularly rich field of 
scientists active in commercialization, transfer- 
ring technology that has helped China become 
a leader in carbon nanotubes and high-speed 
computing. It has also contributed to a broad 
range of biomedical breakthroughs including 
cancer biomarkers and medical devices such 
as pacemakers. 

Many of these are interdisciplinary projects. 
A team led by chemical engineer Dehua Liu, for 
example, designed a new enzymatic process for 
converting renewable oils and fats to biodiesel. A 
bioenergy production plant is now pumping out 
20,000 tons of biodiesel per year — a figure that 
will jump five-fold next year — and the technol- 
ogy has been transferred to companies in several 
countries, including Germany and Brazil. 

In 2014 alone, Tsinghua had 2,010 domestic 
patent applications (1,360 of which have been 
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accepted) and 264 more in the United States. It 
received 150 million RMB from 61 transfer or 
licensing agreements. 

Specific local programmes help, says Jin. Bei- 
jing’s municipal government provides start-up 
funds in exchange for shares in the companies. 
It also nurtures companies with initiatives such 
as making the first purchase order for products, 
before they are even proven, from Beijing firms. 

But what has really pushed the city’s industrial 
blossoming are the rich human resources, con- 
centrated especially in the massive Zhongguan- 
cun industrial zone and technology hub that 
neighbours both Tsinghua and Peking univer- 
sities. “Companies come and they get access to 
students, to laboratories, to professors; says Jin. 


WUHAN 
(9 WEC rank China: 4 
O3.AC: 619 


Wuhan is one of the fastest growing cities in 
China in terms of the scientific output of its 
corporations, and it is quick to capitalize (see 
‘Industry champions’). The Huazhong Uni- 
versity of Science and Technology (HUST), for 
example, has a long list of successful spin-off 
companies based on optical fibre development 
for high intensity laser and 3D printing. With 
the success of research intensive companies 
such as Huagong Tech, which produces lasers, 
holograms, and optical communication devices, 
and Guide Infrared, which manufacturers a 
cutting-edge night vision camera, Wuhan has 
taken a place at the forefront of China’s opto- 
electronics and telecommunications boom. 

Tang Jiang, a thin-film photovoltaics 
researcher at Wuhan National Laboratory for 
Optoelectronics, says the presence of many 
leading universities in Wuhan, such as HUST, 
which ranked 19 in engineering in the US News 
global survey of universities, has played a major 
role in gearing up this industrial output. 

Tang’s institute has several examples of 
technology transfer, including blue photolu- 
minescent materials for organic light-emitting 
diodes (OLEDs), a UV lighting diode used for 
solidification in printing and a ‘micro-optical 
tomography systeny for brain imaging. 

Local government policies have promoted 
technology transfer from universities, estab- 
lishing a requirement that a research team 
receives at least a 70% share of the technology 
transfer profit. “These policies significantly 
encourage professors in universities to focus 
on application orientated research and the 
consequent technology transfer,’ Tang explains. 

Tang has yet to commercialize his own 
research, infrared photodetection and the 
creation of new materials based on the prom- 
ising thermoelectric antimony selenide. This 
work could lead to non-toxic and cheap next- 
generation flexible solar cells. But once his 
devices reach an energy conversion efficiency 
threshold of 10%, from the current best of 5.6%, 
he will reach out to industry. = 
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Hong Kong-based Yuen Kwok-Yung, who identified the cause of SARS, says collaboration between scientists of different perspectives leads to novel breakthroughs. 


ALLIANCES FOR SCIENTIFIC SUCCESS 


The diverse histories of China’s cities strongly influence their collaboration patterns. 


BY HEPENG JIA 


u Aimin knew that collaborating 
X een his best chance for success 

in his quest at Hong Kong University 
(HKU) to identify the relationship between 
obesity and the glucose-regulating hormone 
adiponectin. 

Connecting with colleagues in Saudi Ara- 
bia and Korea, he formed a team that revealed 
something unexpected. While obesity is a con- 
sequence of metabolic dysfunction, it is also 
exacerbated by it, because the expression of 
adiponectin is reduced. Altering this activity 
may represent a new strategy for the treatment 
of obesity-related disorders, their study, pub- 
lished in Nature Communications, suggests. 


TIANJIN 
HEFEle 


HONG KONG 


The findings came about through the team’s 
combined knowledge and resource base, 
drawing on the Korean researchers’ physiology 
expertise, the HKU lab’s excellence in biology 
and unique animal models, and the Saudi 
Arabian lab’s clinical samples. 

“Hong Kong is a relatively small place; says 
Xu, a professor in the university’s department 
of medicine. “Collaboration with [Chinese] 
mainland and overseas institutes is the best 
way to maximize the economic and societal 
impacts of our research” 

Xu’s first-hand experience echoes what large 
scale studies show about science in the twenty- 
first century: research resulting from collabo- 
rations is more frequently cited, especially 
papers with international co-authors. 

In the Nature Index, three Chinese cities 
stand out for their collaborative orientation: 
Hong Kong, Hefei and Tianjin. 

The focuses of their collaborations differ, 
all bring great reward. While Hong Kong and 
Hefei institutions have formed a record num- 
ber of partnerships with their international 
peers (see ‘Hong Kong’s hotspots’), Tianjin 
scientists have focused on forging local links 
(see ‘Close ties’). 


A sculptural sundial at HKUST represents early 
human invention, an inspiration for scientists. 


Exploring the roots of these patterns reveals 
the importance of history in shaping regional 
strengths. 

As Wu Yishan, vice president of the Chinese 
Academy of Science and Technology for 
Development, puts it: “With the joint forces 
of historical tradition, research capacity, local 
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policies and personal links, Chinese regions 
have formed different preferences in research 
collaborations,’ 


HONG KONG 
3 WEC rank China: 7 
{Y AC: 600 


The index shows that Hong Kong’s collabora- 
tions are firmly entrenched with, but certainly 
not limited to, mainland China, the United 
States and Europe. For example, the two lead- 
ing international collaborators for the Hong 
Kong University of Science and Technology 
(HKUST) are the National University of Singa- 
pore and Singapore's Agency for Science, Tech- 
nology and Research. HKU’s most frequent 
overseas collaborator is Taiwan's National Tsing 
Hua University (see ‘Hong Kong’s hotspots’). 

The collaborative atmosphere in Hong Kong 
appears to be fostered by funding policies. The 
region's local agencies support a large number 
of collaborative and joint funding schemes 
with other bodies at home and overseas. 

An example is the HKU-Pasteur Research 
Centre, which was established to tackle emerg- 
ing infectious diseases in China and elsewhere 
in Asia. Yuen Kwok- Yung, an HKU microbi- 
ologist who is renowned for tracing the human 
SARS (severe acute respiratory syndrome) 


HONG KONG’S HOTSPOTS 


While collaborations with the United States dominate 
the top five partnerships for Hong Kong's leading 
institutions, each has a particular regional focus for 
their remaining partners. 


REACHING OUT 


The University of 
Hong Kong has many 
partnerships in the 
United States and 
western Europe. 
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coronavirus to Chinese horseshoe bats, was 
appointed its first co-director. Yuen then went 
on to help launch the HKU AIDS Institute in 
collaboration with the Aaron Diamond AIDS 
Research Center, an affiliate of Rockefel- 
ler University in New York. “Scientists from 
different cultures and ethnicities have very 
different and novel perspectives for looking 
at a scientific question and provide varied 
approaches to find the solution,” Yuen says. 


HEFEI 
O39 WFC rank China: 5 
fY AC: 696 


Like Hong Kong, Hefei, capital of the Chinese 
mainland hinterland province of Anhui, has a 
limited number of research institutions in the 
index. One of these, the University of Science 
and Technology of China (USTC), is the 
driving force behind the city’s collaborations. 
USTC’s WEC between 2012 and 2014 was 517, 
more than six times that of the second player, 
the Hefei Institutes of Physical Science, which 
is affiliated with the Chinese Academy of 
Sciences (CAS). 

History and tradition help explain USTC’s 
unique role. Founded by CAS, the university 
moved from Beijing to Hefei in 1969 during 
the Cultural Revolution. Following the opening 
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Hefei’s USTC campus provides a highly supportive 
environment for collaborative research. 


up of China, a large number of USTC alumni 
went to overseas institutions, temporarily or 
permanently. This, combined with a dearth of 
other local institutes to work with, has pushed. 
the university towards international collabora- 
tions for cutting-edge research, particularly in 
physics and chemistry. 

A USTC physics professor, Guo Guoping, 
says that he and colleagues regularly seek col- 
laborations to promote theoretical develop- 
ments based on their experimental results. 
“International partners are crucial to explore 
our frontier studies,” he 
says. A good example 


iti 
is a study Guo recently ers ed ad 
co-authored in Physics co. apora ton 
Review Letters with sci-  '° reflected me 
entists from the United Hong Kong’s 
States and Japan which funding 
explores the application policies.” 
of graphene in quantum 
communication. 


USTC is jointly sponsored by CAS and the 
Ministry of Education. This historic link has 
contributed to the university’s widespread 
collaborations with CAS institutes. In 2014 
CAS was USTC’s largest partner, earning it 
a collaboration score of 136.43. The collabo- 
ration score is an indicator of an institution’s 
collaboration in terms of co-authorship of 
articles in the 68 high-impact journals covered 
by the index. 

Both Guo and Yuen at HKU argue that 
while policy support is important, science 
rather than money drives collaboration. 


IT TAKES TWO 


International and domestic collaboration scores of Tianjin’s top 10 collaborating institutions. 
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“Special collaboration grants to support travel 
and conferences are good, but only support the 
original ideas,’ Guo says. 


TIANJIN 
© WFC rank China: 10 
(3 AC: 461 


Tianjin has more universities in the index than 
Hong Kong and Hefei, although much of its 
basic research is conducted at two institutions 
— Nankai and Tianjin universities. These 
universities had collaboration scores of 254.3 
and 163.9 in 2014; much larger than the 
combined figures of the remaining universities 
and institutes in this northern city, which 
neighbours Beijing. 

Research at these universities is highly com- 
plementary, partly because of historical logis- 
tics. In 1952, in line with a Soviet model, the 
newly-founded People’s Republic of China 
government transferred most of the science 
departments in Tianjin University (TJU) to 
Nankai while boosting TJU’s engineering 
capacity. As a result, TJU’s engineering research 
and application is strong, but its basic research 
is relatively weak. To counter this, the neigh- 
bouring universities formed a strong partner- 
ship in chemistry and physical science research. 

In 2012, partially funded by the Ministry of 
Education and Tianjin municipal government, 


CLOSE TIES 


Researchers at Nankai and Tianjin universities co-author more papers with each other than 


any other domestic partner. Top five domestic collaborators in 2014 are shown. 
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TJU and Nankai jointly established the Tianjin 
Co-Innovation Center of Chemical Science and 
Engineering. “The centre has produced many 
of our co-authorships,’ says Nankai materials 
chemist, Yang Zhimou, who along with his 
team has authored several Nature Index papers. 
Many TJU faculty members graduated from 
Nankai or vice versa and retain strong alle- 
giances to their alma mater. This contributes 
to ongoing close collaboration, says Ma Jun- 
An, a TJU Department of Chemistry professor 
who recently published in Organic Letters, a top 
chemistry journal in the Nature Index. 


Tianjin University researchers explore airflow for 
the first China-developed passenger jet, the C919. 


Nankai and TJU are Tianjin’s largest col- 
laborators, and in 2014, most frequently col- 
laborated with each other (see ‘Close ties’). 
These strong local links have been forged, 

according to Yang, as a 


“International result of relatively poor 
partners funding and a lack of 
are crucial equipment at Nankai 
to explore and Tianjin universi- 
our frontier ties. This situation has 

a ay pushed researchers to 
studies. 


exploit resources avail- 
able locally. 

But local partnerships do not thwart collab- 
oration with other domestic and international 
partners. “Besides strong local partnership, 
we also have stable collaborations with CAS, 
CNRS [in France], RIKEN [in Japan] and the 
University of Texas [in the United States]. They 
are mutually supportive,’ Ma says. 

The Nankai-TJU Co-innovation Center 
was established under policies of the 
ministries of education and of science and 
technology to encourage collaborations among 
Chinese research institutions. Initiated in 
2011, so far nearly 100 co-innovation centres 
have been recognized and funded partially 
by the ministries. Other financial sources 
come from the universities themselves, local 
governments and industries, when a centre is 
industry-oriented. = 
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A GUIDE TO THE NATURE INDEX 


A description of the terminology and methodology used in this supplement, 
and a guide to the functionality available free online at natureindex.com. 


he Nature Index is a database of 

author affiliations and institutional 

relationships. The index tracks 
contributions to articles published in a group 
of highly selective science journals, chosen by 
an independent group of active researchers. 

The Nature Index provides absolute counts 
of publication productivity at the institutional 
and national level and, as such, is one indicator 
of global high-quality research output. 

Data in the Nature Index are updated 
monthly, with the most recent 12 months of 
data made available under a Creative Com- 
mons licence at natureindex.com. 

The database is compiled by Nature 
Publishing Group (NPG) in collaboration 
with Digital Science. 

The list of journals tracked by the Nature 
Index is under review, and from 2016 will be 
extended to include the clinical sciences. 


NATURE INDEX METRICS 

There are three measures provided by the 
Nature Index to track affiliation data. The sim- 
plest is the article count (AC). A country or 
institution is given an AC of 1 for each article 
that has at least one author from that country 
or institution. This is the case whether an arti- 
cle has one or a hundred authors, and it means 
that the same article can contribute to the AC 
of multiple countries or institutions. 

To get a better sense of a country or 
institution’s contribution to an article, and 
to remove the possibility of counting articles 
more than once, the Nature Index uses the 
fractional count (FC), which takes into 
account the relative contribution of each 
author to an article. The total FC available per 
paper is 1, which is shared between all authors 
under the assumption that each contributed 
equally. For instance, a paper with 10 authors 
means that each author receives an FC of 0.1. 
For authors who have joint affiliations, the 
individual FC is then split equally between 
each affiliation. 

The third measure used is the weighted 
fractional count (WFC), which applies a 
weighting to the FC to adjust for the over- 
representation of papers in astronomy and 
astrophysics. The four journals in these 
disciplines publish about 50% of all papers 
in international journals in this field — 
approximately five times the equivalent 
percentage for other fields. Therefore, 
although the data for astronomy and 
astrophysics are compiled in the same way 
as for all other disciplines, articles from these 
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A global indicator of high-quality research 


natureindex.com 
users can search for 
specific institutions 
or countries and 
generate their own 
reports, ordered by 
article count (AC), 
fractional count (FC) 
or weighted fractional 
count (WFC). 

Each query will 
return a profile page 
that lists the country 
or institution’s recent 
research outputs, 
from which it is 
possible to drill down 
for more information. 
For example, articles 
can be displayed by 
journal, and then by 
article title. As in the 
supplement, research 
outputs are organized 


Home 


Home / Institution outputs / 


Institution name 
Country 


Region: Global 
Subject/journal group: All 


Below, the same research outputs are grouped by 
subject. Click on the subject to drill-down into a list 
of articles organized by journal, and then by title. 


subject area. 


The table to the right includes counts of all 
research outputs for Institution name published 
between 1 January 2014 - 31 December 2014 
which are tracked by the Nature Index. 


Note: Articles may be assigned to more than one 
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journals are assigned one-fifth the weight of 
other articles (i-e., the FC is multiplied by 0.2 
to derive the WFC). 

The total FC or WFC for an institution is 
calculated by summing the FC or WEC for 
individual authors. 

The process is similar for countries, 
although complicated by the fact that some 
institutions have overseas labs that will be 
counted towards their host country totals. 
What's more, there is great variability in the 
way authors present their affiliations. Every 
effort is made to count affiliations consistently, 
with a background of reasonable assumptions. 

For more information on how the affiliation 
information is processed and counted, please 
see the FAQ section at natureindex.com. 
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THE SUPPLEMENT 

Nature Index 2015 China is based on data from 
the Nature Index, covering articles published 
during three consecutive years between 1 
January 2012 and 31 December 2014. 

Most analyses within the supplement use 
WFC as the primary metric, as it provides a 
more even basis for comparison across multiple 
disciplines, and in determining the relative 
contribution of each city or institution. Some 
sections and graphics also refer to collaboration 
score. This is a relatively new metric that is 
derived by adding the FC for all the bilateral 
relationships for that institution or country. If 
institution A has relationships with two others, 
BandC, then the collaboration score is the sum 
of FC forA+BandA+C.m 


NATURE INDEX CHINA TABLES 


China’s leading institutions for high-quality science, ordered by weighted fractional 
count (WFC) for 2014. Also shown are the total number of articles, and the change in 
WFC from 2013. Articles are from the 68 journals that comprise the Nature Index 
(see ‘How to use the index’, S190). 


TOP 200 INSTITUTIONS 
CHANGE IN WFC 
2014 INSTITUTION WFC 2013 WFC 2014 AC 2014 2013-2014 

1 Peking University (PKU) 275.51, 293.86 1,019 6.7% 
a Nanjing University (NJU) 196.52 215.08 518 94% 
3 Tsinghua University (TH) 195.15 211.39 666 8.3% 
4 University of Science and Technology of China (USTC) 175.78 193.90 561 10.3% 
5 Zhejiang University (ZJU) 150.44 192.13 364 27.7% 
6 Fudan University 129.42 166.21 356 28.4% 
7 Institute of Chemistry (ICCAS), CAS 124.85 124.34 306 -0.4% 
8 Shanghai Institute of Organic Chemistry (SIOC), CAS 105.62 114.25 210 8.2% 
9 Lanzhou University (LZU) 69.72 110.38 186 58.3% 
10 Shanghai Jiao Tong University (SJTU) 96.01 108.06 290 12.5% 
ala Jilin University JLU) 97.50 104.93 189 76% 
12 Wuhan University (WHU) 98.90 96.93 164 -2.0% 
113} Xiamen University (XMU) 76.02 95.56 215 25.7% 
14 Nankai University (NKU) i332 93.43 230 -17.7% 
15 Sichuan University (SCU) 76.83 93.36 177 21.5% 
16 Soochow University 65.25 91.43 169 40.1% 
17 Sun Yat-sen University (SYSU) 79.43 89.72 193 13.0% 
18 University of Chinese Academy of Sciences (UCAS) TApeas 89.12 524 25.1% 
19 Institute of Physics (IOP), CAS 77.24 87.88 267 13.8% 
20 East China Normal University (ECNU) 65.56 83.17 148 26.9% 
21 Changchun Institute of Applied Chemistry (CIAC), CAS 80.69 82.09 142 1.7% 
Be Hunan University (HNU) 54.57 77.38 111 41.8% 
23 Hong Kong University of Science and Technology (HKUST) 54.60 74.62 136 36.7% 
24 The University of Hong Kong (HKU) 71.38 WAN/7) 186 0.5% 
25 Dalian Institute of Chemical Physics (DICP), CAS 61.90 71.75 139 15.9% 
26 East China University of Science and Technology (ECUST) 56.75 Jy 130 25.6% 
27 Xi'an Jiaotong University (XJTU) 42.98 67.79 170 57.7% 
28 Fujian Institute of Research on the Structure of Matter (FJIRSM), CAS 59.54 64.96 124 o1%, 
29 Shandong University (SDU) 39.18 63.00 158 60.8% 
30 Huazhong University of Science and Technology (HUST) 43.04 57.39 154 833% 
31 Dalian University of Technology (DUT) 61.42 52°36 96 -14.7% 
32 Shanghai Institutes for Biological Sciences (SIBS), CAS 51.44 5212 131 1.3% 
33 Southeast University (SEU) 30.94 51.64 110 66.9% 
34 Beijing Normal University (BNU) 39.81 50.82 144 277% 
35 Northeast Normal University (NENU) 30.73 48.53 67 57.9% 
36 Tianjin University (TJU) 33.90) 46.23 151 36.4% 
37 Tongji University 40.83 45.85 107 12.3% 
38 South China University of Technology (SCUT) 30.74 45.79 eal 49.0% 
39 Hefei Institutes of Physical Science (HIPS), CAS 19.97 39.73 77 99.0% 
40 Institute of Semiconductors (IOS), CAS 35.66 36.76 87 3.1% 
41 Fuzhou University (FZU) 26.76 35193 56 34.3% 
42 The Chinese University of Hong Kong (CUHK) 39.39 35.82 110 -9.1% 
43 People's Liberation Army (PLA) 42.90 35733 132 -17.6% 
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CHANGE IN WFC 


2014 INSTITUTION WFC 2013 WFC 2014 AC 2014 2013-2014 
iM) Northeastern University (NEU) 2.10 13.66 Pail 550.0% 
98 Institute of Genetics and Developmental Biology (IGDB), CAS 10.20 13.43 38 31.7% 
99 Ningbo Institute of Materials Technology and Engineering (NIMTE), CAS 14.94 13.28 23} -11.1% 
100 Beijing University of Technology (BJUT) ior Uepard 29 73.0% 
101 Shanghai Normal University (SHNU) 5 13.20 24 251.8% 
102 China Academy of Engineering Physics (CAEP) 1327 13.11) 50 -1.2% 
103 Qingdao University of Science and Technology (QUST) 10.66 13.08 25) 22.7% 
104 Shaanxi Normal University (SNNU) 11.80 195 ae 9.8% 
105 Nanjing University of Aeronautics and Astronautics (NUAA) 6.58 12.84 26 95.0% 
106 Anhui Normal University (AHNU) 5.96 12.80 1S 114.8% 
107 Hong Kong Baptist University (HKBU) 12.77 12.78 33 0.1% 
108 South University of Science and Technology of China (SUSTC) 2.24 12.52 Sil 458.3% 
109 China Meteorological Administration (CMA) 9.93 12.34 35 24.3% 
110 Wenzhou University (WZU) 8.05 ie 21 48.6% 

ial Research Center for Eco-Environmental Sciences (RCEES), CAS 6.60 11.92 20 80.5% 
Li? South China Sea Institute of Oceanology (SCSIO), CAS 10.59 11.82 23 11.6% 
ill} Institute of Oceanology, CAS (IOCAS) Bile 11.48 Ze. 124.4% 
114 Henan University (HENU) 1253 71.25 18 -9.4% 
115 Institute of Atmospheric Physics (IAP), CAS 17.87 11.34 38 -36.5% 

16 Zhejiang University of Technology (ZJUT) 8.03 11.10 22 38.2% 

yy China Pharmaceutical University (CPU) 3.88 0.87 24 180.1% 
118 National University of Defense Technology (NUDT) 16.16 10.68 44 -33.9% 
119 National Institute of Biological Sciences, Beijing (NIBS) 11.73 10.65 31 -9.2% 
120 Jinan University (JNU) 4.31 10.62 30 146.3% 
121 Hunan Normal University (HUNNU) 11:12 10.38 17 -6.7% 
122 Nanchang University (NCU) 10.06 75 a -3.1% 
1123) Xidian University 3.78 9.65 12 155.5% 
124 Institute of Microbiology (IM), CAS 9.09 9.59 24 5.6% 
125 Nanjing University of Posts and Telecommunications (NUPT) 8.28 9.54 21 15.2% 
126 Institute of Geology and Geophysics (IGG), CAS 7.94 9.48 22 19.3% 
2, Renmin University of China (RUC) 8.89 9.46 24 6.4% 
128 China Earthquake Administration (CEA) 9.50 9.46 26 -0.5% 
129 Shanghai Institute of Optics and Fine Mechanics (SIOM), CAS 9.82 9.19 17 -6.4% 
130 Qingdao Institute of Bioenergy and Bioprocess Technology (QIBEBT), CAS 2.90 9.10 te) 213.9% 
131 Guangzhou Institutes of Biomedicine and Health (GIBH), CAS 19.60 8.91 18 -54.6% 
132 Jiangsu Normal University JSNU) 5.6L 8.66 is 54.4% 
iS) Hebei University (HBU) 7.86 8.11 13 3.1% 
134 Changchun Institute of Optics, Fine Mechanics and Physics (CIOMP), CAS 10.33 195 16 -23.1% 
135 University of Shanghai for Science and Technology (USST) 4.15 7.94 12 91.5% 
136 Hangzhou Normal University (HZNU) 252 7.94 33 -16.6% 

oy; Shenzhen University (SZU) 5.05 WHS 26 53.0% 
138 Jiangxi Normal University (XNU) 3.14 7, 23 146.0% 
139 Hefei University of Technology (HFUT) 8.97 7.70 6 -14.1% 
140 Nanjing University of Science and Technology (NUST) 6.43 7.60 18 18.3% 
141 Yangzhou University (YZU) 4.67 7.47 ils} 60.1% 
142 Anhui University (AHU) Sal 7.28 18 96.1% 
143 Zhejiang Normal University (ZJNU) 8.44 27 5) -13.9% 
144 Northwest A&F University (NWAFU) 9.90 L226 ie) -26.6% 
145 Shantou University (STU) 73 6.99 18 -1.9% 
146 Institute of Botany (IBCAS) 3:02 6.71 15 122.3% 
147 Southern Medical University (SMU) 3.47 6.70 23 93.4% 
148 Chinese Academy of Agricultural Sciences (CAAS) 8.10 6.66 26 -17.8% 
149 Hebei Normal University 1.40 6.59 15 371.3% 
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CHANGE IN WFC 


2014 INSTITUTION WFC 2013 WFC 2014 AC 2014 2013-2014 
150 Purple Mountain Observatory (PMO), CAS 6.31 6.40 oF 1.5% 
isl Heilongjiang University (HLJU) 6.67 6.39 0 -4.2% 
152 North China Electric Power University (NCEPU) 6.50 6.27 14 -3.6% 
153 Jiangnan University (JU) 8.91 6.21 2 -30.3% 
154 Nanjing University of Information Science and Technology (NUIST) 7.74 6.09 25 -21.3% 
155 Jiangsu University (JU) 3.42 6.04 il7/ 76.4% 
156 Beijing Jiaotong University (BJTU) 4.15 57 15 43.9% 
ilisy/ Nanjing Agricultural University (NAU) 2.59 5.83 18 125.0% 
158 Northwest Normal University (NWNU) 10:25. Bail ll 2,184.8% 
159 Xinjiang Technical Institute of Physics and Chemistry (XTIPC), CAS 4.98 5.66 0) 13.8% 
160 University of Jinan (UJN) Bie! 5.65 10 14% 
161 Capital Medical University (CMU) 3.05 5.54 38 81.6% 
162 Shanghai Astronomical Observatory (SHAO), CAS 4.76 5:51) 102 15.8% 
163 Xuzhou Medical University (XZMC) 0.31 5.39 14 1,665.9% 
164 Guangxi Normal University (GXNU) 4.60 5.30 13 15.3% 
165 Tianjin Medical University (TMC) 6.73 5.21) 17 -22.7% 
166 Yantai Institute of Coastal Zone Research (YIC), CAS 2.14 4.95 8 131.2% 
167 Capital Normal University (CNU) 6.65 4.88 19 -26.6% 
168 Institute of Tibetan Plateau Research (ITP), CAS 7.22 4.73 16 -34.5% 
169 Tianjin Normal University (TJNU) 0.94 4.71 18 402.1% 
170 Qufu Normal University (QFNU) 3.63 4.67 i] 28.8% 

alt Harbin Engineering University (HEU) Dey) 4.67 7 85.1% 
172 Zhejiang Sci-Tech University (ZSTU) 4.94 4.64 13 -6.2% 
173} Institute of Microelectronics (IME), CAS 6.80 4.44 dil -34.7% 
174 Chengdu Institute of Biology (CIB), CAS 5.45 4.44 14 -18.6% 
ys Beijing University of Posts and Telecommunications (BUPT) 3.42 4.37 7 28.0% 
176 BGl 3.82 4.35 21 14.0% 
177 Linyi University (LYU) 3.19 4.29 18 34.4% 
178 Guizhou University (GZU) 3.18 4.29 10 35.0% 
179 Ningbo University (NBU) 7.56 4.27 14 -43.5% 
180 Qingdao University (QU) 2.89 4.27 14 476% 
181 Institute of Vertebrate Paleontology and Paleoanthropology (IVPP), CAS 4.19 4.27 20 1.8% 
182 Hebei University of Technology (HEBUT) 1.65 4.25 df 158.0% 
183 Institute of Process Engineering (IPE), CAS Sh5i7/ 4.22 14 18.1% 
184 Huaibei Normal University (HUN) 4.78 4.19 9 -12.2% 
185 Yanshan University (YSU) 4.03 4.14 v 2.8% 
186 Yantai University 0.64 4.12 9 541.1% 
187 Institute of Mechanics (IM), CAS 3.84 4.09 9 6.7% 
188 Tianjin University of Technology (TUT) bibs 4.00 a -277% 
189 Chongqing Medical University (CQMU) 4.31 3.83 14 -11.2% 
190 Wenzhou Medical University (WMU) 1.49 eeetl 11 155.7% 
191 Kunming University of Science and Technology (KUST) 3.04 3.77 9 24.1% 
192 Chinese Center for Disease Control and Prevention (China CDC) S10) S/T 24 Ws 
193 Chengdu Institute of Organic Chemistry (CIOC), CAS 4.16 3.76 8 -9.6% 
194 Donghua University (DHU) 6.34 Sie 11 -41.4% 
195 Henan Polytechnic University (HPU) 0.68 ShV/Al ils} 442.5% 
196 Anhui Medical University (AHMU) 2.12 oi! 16 74.7% 
197 Hubei University (HUBU) 363) 3.68 r/ 1.4% 
198 National Space Science Center (NSSC), CAS 2.87 3.65 28 27.3% 
199 Institute of Coal Chemistry (ICC), CAS 1.69 S61 9 113.8% 
200 Institute of Electrical Engineering (IEE), CAS 2.09 3.60 Uf 72.3% 
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TOP 50 INSTITUTIONS IN LIFE SCIENCES 


Shanghai Institutes for Biological Sciences (SIBS), CAS 
Peking University (PKU) 

Tsinghua University (TH) 

hanghai Jiao Tong University (SJTU) 

hejiang University (ZJU) 

eople's Liberation Army (PLA) 


hinese Academy of Medical Sciences & Peking Union Medical College (CAMS & PUMC) 


S 
Z 
P 
Sun Yat-sen University (SYSU) 
Cc 
S 


handong University (SDU) 

Fudan University 

University of Science and Technology of China (USTC) 
Institute of Biophysics (IBP), CAS 

University of Chinese Academy of Sciences (UCAS) 
Institute of Zoology (OZ), CAS 

BGI 

Institute of Genetics and Developmental Biology (IGDB), CAS 
Wuhan University (WHU) 

The University of Hong Kong (HKU) 

Nanjing University (NJU) 

Huazhong University of Science and Technology (HUST) 
Huazhong Agricultural University (HZAU) 

Hong Kong University of Science and Technology (HKUST) 
National Institute of Biological Sciences, Beijing (NIBS) 
Xiamen University (XMU) 

China Agricultural University (CAU) 

Beijing Normal University (BNU) 

Institute of Microbiology (IM), CAS 

Nanjing Medical University (NJMU) 

Nankai University (NKU) 

East China Normal University (ECNU) 

The Chinese University of Hong Kong (CUHK) 

Sichuan University (SCU) 

Tongji University 

Soochow University 

Institute of Botany (IBCAS) 

Tianjin Medical University (TMC) 

Shanghai Institute of Materia Medica (SIMM), CAS 
Central South University (CSU) 

Beijing Institute of Genomics (BIG), CAS 

Capital Medical University (CMU) 

Institute of Vertebrate Paleontology and Paleoanthropology (IVPP), CAS 
Southeast University (SEU) 

Chinese Academy of Agricultural Sciences (CAAS) 
Wenzhou Medical University (WMU) 

Southern Medical University (SMU) 

Anhui Medical University (AHMU) 

Yunnan University (YNU) 

Xi'an Jiaotong University (XJTU) 

Kunming Institute of Zoology (KIZ), CAS 

Nanjing Agricultural University (NAU) 
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39.4% 
-17.8% 
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41.5% 
175.7% 
-18.4% 
-8.1% 
20.6% 
76.2% 
-8.0% 
-11.7% 
30.8% 
14.4% 
21.5% 
22.3% 
21.6% 
31.9% 
79.7% 
-4.0% 
57.1% 
-2.9% 
57.9% 
48.2% 
39.7% 
-4.3% 
16.0% 
19.5% 
231.4% 
-16.4% 
24.1% 
90.8% 
-10.1% 
-51.3% 
138.5% 
11.9% 
37.5% 
-0.2% 
55.9% 
-44.4% 
224.7% 
26.0% 
61.0% 
196.1% 
303.9% 
0.6% 
26.8% 


NATURE INDEX 2015 | CHINA | 


$195 


TOP 50 INSTITUTIONS IN CHEMISTRY 


CHANGE IN WFC 


2014 INSTITUTION WFC 2013 WFC 2014 AC 2014 2013-2014 
1 Peking University (PKU) 142.60 152.25 378 6.8% 
& Nanjing University (NJU) 117.69 129.85 220 10.3% 
3 Zhejiang University (ZJU) 86.44 129,11 189 49.4% 
4 Institute of Chemistry (ICCAS), CAS 119.81 120.18 287 0.3% 
2) Fudan University 80.30 ial Gills) 198 45.9% 
6 Shanghai Institute of Organic Chemistry (SIOC), CAS 105.23 113.47 206 78% 
7 University of Science and Technology of China (USTC) 93.78 111.24 233 18.6% 
8 Tsinghua University (TH) 92.19 107.43 249 16.5% 
5 Lanzhou University (LZU) 50.35 88.25 124 75.3% 
(0) Jilin University JLU) 73.64 80.64 133 9.5% 

i Changchun Institute of Applied Chemistry (CIAC), CAS Vie Ney 7925 134 -0.3% 
ile Sichuan University (SCU) 68.45 7515 107 9.8% 
13 Xiamen University (XMU) 59.87 713.56 146 22.9% 
14 Dalian Institute of Chemical Physics (DICP), CAS 61.65 70.08 135 13.7% 
15 Hunan University (HNU) 51.90 69.09 oy 33.1% 
16 Nankai University (NKU) 86.00 67.62 166 -21.4% 
17 East China University of Science and Technology (ECUST) 53.04 66.33 112 25.0% 
18 Wuhan University (WHU) 63.32 65.98 100 4.2% 
ig Fujian Institute of Research on the Structure of Matter (FJIRSM), CAS 565.22 63.42 121 14.9% 
20 Soochow University 34.69 62.89 104 81.3% 
21 Sun Yat-sen University (SYSU) 52.10 56.43 oT 8.3% 
22 University of Chinese Academy of Sciences (UCAS) 47.89 55.93 278 16.8% 
23 East China Normal University (ECNU) 41.54 55.26 86 33.0% 
24 Shanghai Jiao Tong University (SJTU) 45.14 51.63 93 14.4% 
25 Northeast Normal University (NENU) 25.65 42.59 54 66.0% 
26 Dalian University of Technology (DUT) 47.16 41.33 16 -12.4% 
ar South China University of Technology (SCUT) 28.75 39.39 1 37.0% 
28 Tianjin University (TJU) 25.59 Sara) 131 45.4% 
29 The University of Hong Kong (HKU) 39.04 36.31 60 -7.0% 
30 Hong Kong University of Science and Technology (HKUST) 25.86 36.24 72 40.2% 
ll Shandong University (SDU) 19.69 31.88 57 61.9% 
32 Fuzhou University (FZU) 26.51) ot23) 49 17.8% 
33 Southeast University (SEU) 9.78 28.13 by/ 193.7% 
34 Technical Institute of Physics and Chemistry (TIPC), CAS 22.50 28.42 61 26.3% 
35 Beijing University of Chemical Technology (BUCT) 20,35 28.33 46 39.2% 
36 Tongji University 1951 28.04 44 43.7% 
Si Huazhong University of Science and Technology (HUST) 12.58 26.68 58 112.0% 
38 Xi'an Jiaotong University (XJTU) 10), 72 26.50 67 147.1% 
39 Shanghai Institute of Materia Medica (SIMM), CAS 18.28 24.56 39) 34.4% 
40 Northwest University (NWU) 10.29 21.95 25 113.3% 
Al Shanghai University (SHU) 8.57 21.54 43 151.1% 
42 Beijing Normal University (BNU) 10.56 21.11 37 99.9%, 
43 Hefei Institutes of Physical Science (HIPS), CAS 8.68 20.92 40 141.1% 
44 Southwest University (SWU) 125i 20.67 26 65.2% 
45 Beijing Institute of Technology (BIT) 16.95 20.47 42 20.7% 
46 Institute of Physics (IOP), CAS 14.31 20.41 is 42.7% 
47 Nanjing Tech University (NanjingTech) 11.04 19.81 42 79.4% 
48 National Center for Nanoscience and Technology (NCNST), CAS 21.82 19.69 38 -9.8% 
49 Harbin Institute of Technology (HIT) 14.15 19:20 35 35.7% 
50 Lanzhou Institute of Chemical Physics (LICP), CAS 15.25 18.86 44 23.7% 
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TOP 50 INSTITUTIONS IN PHYSICAL SCIENCES 


Peking University (PKU) 

Tsinghua University (TH) 

Institute of Physics (IOP), CAS 

University of Science and Technology of China (USTC) 
Nanjing University (NJU) 

Fudan University 

Zhejiang University (ZJU) 

Xi'an Jiaotong University (XJTU) 

Institute of Semiconductors (10S), CAS 

Shanghai Jiao Tong University (SJTU) 

Hong Kong University of Science and Technology (HKUST) 
Soochow University 

Jilin University JLU) 

nstitute of Chemistry (ICCAS), CAS 

Huazhong University of Science and Technology (HUST) 
Southeast University (SEU) 

nstitute of Theoretical Physics (ITP), CAS 

The University of Hong Kong (HKU) 

ankai University (NKU) 

East China Normal University (ECNU) 

ational Astronomical Observatories (NAOC), CAS 


University of Science and Technology Beijing (USTB) 
Wuhan University (WHU) 

Shanghai Institute of Ceramics, CAS (SICCAS) 

Hefei Institutes of Physical Science (HIPS), CAS 

Sun Yat-sen University (SYSU) 

University of Chinese Academy of Sciences (UCAS) 
Lanzhou University (LZU) 

Institute of High Energy Physics (IHEP), CAS 


University of Electronic Science and Technology of China (UESTC) 


Beijing Normal University (BNU) 

Beijing Institute of Technology (BIT) 

Harbin Institute of Technology (HIT) 

Shanghai Institute of Technical Physics (SITP), CAS 


Suzhou Institute of Nano-Tech and Nano-Bionics (SINANO), CAS 


City University of Hong Kong (CityU) 

Beihang University (BUAA) 

Shandong University (SDU) 

Sichuan University (SCU) 

Tongji University 

Nanjing University of Aeronautics and Astronautics (NUAA) 
Hunan University (HNU) 


Shanghai Institute of Microsystem and Information Technology (SIMIT), CAS 


Northwestern Polytechnical University (NPU) 

Dalian University of Technology (DUT) 

Changchun Institute of Applied Chemistry (CIAC), CAS 
The Chinese University of Hong Kong (CUHK) 

Xiamen University (XMU) 

National University of Defense Technology (NUDT) 
Shanghai University (SHU) 
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6.0% 
10.8% 
7.6% 
3.1% 
6.9% 
43.8% 
-25.1% 
21.4% 
1.0% 
0.8% 
50.9% 
11.9% 
-0.8% 
9.3% 
2.0% 
24.5% 
-1.0% 
-15.2% 
-7.9% 
25.6% 
5.0% 
12.0% 
1.0% 
5.2% 
61.5% 
-18.1% 
8.3% 
30.4% 
-29.0% 
49.5% 
-11.6% 
115.4% 
-27.8% 
89.8% 
5.2% 
-28.5% 
40.3% 
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TOP 25 INSTITUTIONS IN EARTH AND ENVIRONMENTAL SCIENCES 


CHANGE IN WFC 


2014 INSTITUTION WFC 2013 WFC 2014 AC 2014 2013-2014 
1 Nanjing University (NJU) o72 14.94 33 53.7% 
2 Peking University (PKU) Si 14.71 | 50 101.1% 
3 State Oceanic Administration (SOA) 5.90 14.67 s2 148.5% 
4 China Meteorological Administration (CMA) 9.76 12.32 34 26.2% 
8) Institute of Atmospheric Physics (IAP), CAS 1779 11.06 36 -37.8% 
6 China University of Geosciences (CUG) 10.40 10.96 26 5.3% 
ih Institute of Oceanology, CAS (IOCAS) 5.12 10.03 18 96.0% 
8 South China Sea Institute of Oceanology (SCSIO), CAS 54 9.88 20 82.7% 
S China Earthquake Administration (CEA) 9.50 9.46 26 -0.5% 
10 Ocean University of China (OUC) 9.60 8.92 24 -7.1% 
11 University of Science and Technology of China (USTC) 6.91 S71 22 26.2% 
12 Institute of Geology and Geophysics (IGG), CAS 7.54 8.26 ue) O15, 
13 Beijing Normal University (BNU) 6.78 7.58 at 11.8% 
14 University of Chinese Academy of Sciences (UCAS) 3.22 5.44 30 69.0% 
15) Xiamen University (XMU) 1.81 Doe 14 194.1% 
16 Lanzhou University (LZU) 5:25 5,29 | ili5} 0.9% 
7 Nanjing University of Information Science and Technology (NUIST) O37 4.96 22 -22.1% 
18 Institute of Tibetan Plateau Research (ITP), CAS 7.22 4.73 16 -34.5% 
19 Wuhan University (WHU) 5.93 4.65 9 -21.5% 
20 Hong Kong University of Science and Technology (HKUST) 4.54 4.46 | 5) -1.7% 
21 Tsinghua University (TH) 3.09 3.89 13 25.9% 
22 Institute of Earth Environment (IEE), CAS 2.66 3.50 11 32.0% 
23 Guangzhou Institute of Geochemistry (GIG), CAS 3.82 3.49 ll -8.4% 
24 Zhejiang University (ZJU) 2.05 3.26 | 8 59.2% 
25 Yunnan University (YNU) 112 3.14 6 181.2% 


TOP 25 INSTITUTIONS IN NATURE AND SCIENCE 


CHANGE IN WFC 


2014 INSTITUTION WFC 2013 WFC 2014 AC 2014 2013-2014 

1 Peking University (PKU) 4.10 6.48 28 58.0% 

© 2 Tsinghua University(TH) AB 49020 88H 
3 BGI NS 2.84 14 59.1% 

4 Shanghai Institutes for Biological Sciences (SIBS), CAS. 8A 
5 Institute of Biophysics (IBP), CAS 0.13 2.56 d 1,856.6% 

© 6 Zhejiang University JU) GB BH 
i National Institute of Biological Sciences, Beijing (NIBS) Ui ok 4 87.3% 

8 _ Dalian Institute of Chemical Physics (DICP), CAS 089 G5. 
] University of Chinese Academy of Sciences (UCAS) 0.50 1.20 v7, 138.5% 

10 Ocean University of China(QUC) 
iy Institute of Physics (IOP), CAS 0.95 ie 5 18.0% 

_ 12 Huazhong University of Science & Technology (HUST) = 005092 788.3% 
is Shanghai Institute of Organic Chemistry (SIOC), CAS 0.92 o 

© M4 Yanshan University 78 BAH 
15 University of Science and Technology of China (USTC) 169) 0.82 df -51.6% 

16 Harbin Institute of Technology HIT) 07 BO 1,020.0% 
iy/ China National Genebank (CNGB) 0.64 if 

_ 18 Second Military Medical University (SMMU) A 
19 Nanjing University (NJU) 0.05 0.58 2 1,008.3% 

— 20 Chinese Acaclemy of Agricultural Sciences (CAAS) BAS B80. 
21 Shanghai Jiao Tong University (SJTU) 0.18 0.54 il 193.1% 

_ 22 Nanjing Institute of Geology and Palaeontology, CAS 5 SB 8. 
23 Chinese Academy of Medical Sciences & Peking Union Medical College (CAMS & PUMC) O13 0.53 3 322.5% 

© 24 Yunnan University BO BO 87H 
25 Institute of Oceanology, CAS (IOCAS) 0.50 1 

Weighted fractional count (WFC) for each institution is shown to two decimal places only. These results are based on the most recent data available as of 14 September 2015. 

When two or more institutions have the same WFC, their positions are determined by the Owing to continual refinements of the data, the figures in the database are liable to 

thousandth place (or beyond). change and might differ to those printed in the supplements. 
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ADVERTISEMENT FEATURE 


Fuzhou University 


Engine promoting 
innovation in Southeast China 


Founded in 1958, Fuzhou University is in 
Fuzhou city, the capital of Fujian Province 
in southeast China. It is one of a 100 uni- 
versities selected as part of the Ministry 
of Education’s prestigious 211 Project 
aimed at strengthening higher education 
and scientific research. 


uzhou University has been listed in the 
global top 1 per cent for chemistry, engi- 
neering and materials science research by 
Thomson Reuters’ Essential Science Indicators 
(ESI) 2014. The university was ranked 36 in 
Nature Index 2014 China and 23 in ESl’s top 100 
highly cited Chinese universities. In 2014, eight 
Fuzhou professors featured in Elsevier's Most 


Cited Chinese Researchers. Some of the univer- 


sity’s latest research is highlighted below. 


Chemistry 

Photocatalysis. Fuzhou University’s Research 
Institute of Photocatalysis, initiated by Xianzhi 
Fu in 1997, became a State Key Laboratory in 
2013. It focuses on searching for new types of 
photocatalysts and co-catalysts for a range of 
applications. Ithas won onesecond-class Chinese 
National Award for Advancement in Science 
and Technology, two first-class, provincial-level 
awards within the same category, and one first- 


class award from the People’s Liberation Army. 


In 2009, Fu was elected as an academician of the 


Chinese Academy of Engineering (CAE). 


Industrial catalysis and biological analysis. 
The National Engineering Research Center of 
Chemical Fertilizer Catalysts (NERC-CFC) was 


founded by Kemei Wei, a CAE academician. 


Advertiser retains sole responsibility for content 


Focused on environmentally friendly catalysts 
for ammonia and hydrogen production, exhaust 
gas treatment and clean fuel production, NERC- 
CFC has won five national awards, and seven 
provincial and ministerial awards. 

The Key Laboratory of Analysis and 
Detection Technology for Food Safety of 


he Ministry of Education (MOE) explores 


electrochemiluminescence bioanalysis, nano- 


biosensors and biomarker analysis in living 


biological systems. The laboratory has been 


awarded nine provincial and ministerial awards. 


Materials science 
Jiaxi Lu, a key founding member of Fuzhou 
University, established the field of crystalline 


materials science. Over the past six decades, the 


ield has become an influential research subject 
in China and one of the distinctive disciplines of 
Fuzhou University. It focuses on the synthesis 


of diverse crystalline materials, the relationship 


between structures and properties, and the 
application of specific crystalline materials to 
magneto-optics, lasers, and nonlinear optics. 


Researchers at the Institute of Advanced 


Energy Materials have developed a new synthetic 


strategy for the self-assembly of mesostructural 
materials with controllable crystal phases, facets, 
dimensions, sizes, pores and morphologies, 


and discovered the relationships between 


the intrinsic characteristics of mesostructural 


materials and their photovoltaic and 


electrochemical properties. 


The biomedical materials research group 


focuses on novel biomedical materials and their 


applications in diagnostics, theranostics, tissue 


engineering and biosimulation. 


Physics 

The Laboratory of Quantum Optics, led by 
Shibiao Zheng, a Yangtze River Scholar Professor, 
has proposed many important cavity-quantum- 
electrodynamics-based schemes for realizing 
entanglement and quantum logic operations. Of 
their papers published in Physics Review Letters, 
one has been cited over 700 times in Thomson 
Reuters’s Science Citation Index journals. Zheng 


won the second-class National Award for Natural 


Science, and the National Award for Youth in 
Science and Technology. 

The National Engineering Laboratory for 
Flat Panel Display focuses on the design, 
preparation and performance optimization of 
novel photoelectronic devices, such as printing 


displays and light-harvesting devices. 


Mathematics and computer science 

The Center for Discrete Mathematics and 
Theoretical Computer Science (DIMACS-FU), 
headed by Genghua Fan, became the MOE 
Key Laboratory of Discrete Mathematics with 
Applications in 2007. Its research focuses on 
graph theory and combinatorics, mathematical 


methods in very large-scale integration. 


Contact 

Visit: http://www.fzu.edu.cn/ 

Fax: +86-0591-22866099 

E-mail: faomail@fzu.edu.cn 

Address: No.2 Xue yuan Road, Minhou, 
Fuzhou, Fujian, China, 350116 
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INSIDE VIEW 


With a history that extends back to 1902, 
Nanjing Agricultural University (NAU) is one 
of China’s oldest higher education institutions 
with an agricultural science focus. Now one 
of the top 50 universities in the country, NAU 
has matured into a comprehensive research 
university with 20 colleges offering hundreds of 
degree programmes and promoting cutting- 
edge research in basic and applied sciences. 
Here, Zhou Guanghong, president of NAU, 
discusses the development of the university 
and his vision for building a world-renowned 
institution for agricultural education and 
research. 


Q. What are NAU’s primary development 
goals for the next ten years? 

Our goal is to become one of the world’s 

top universities for agricultural sciences. To 
accomplish this, we must attract top-level 
researchers, provide world-class training to 
our students, advance our research in the 
agricultural, life and environmental sciences, 
and — crucially in my view — conduct 
innovative research that benefits society. 
NAU already ranks 78th in the 2015 National 
Taiwan University Ranking for the field of 
agriculture. By 2020, we hope to be among the 
top 50 universities globally in our core subject 
areas, and we want to be among the top 500 
universities in the Academic Ranking of World 
Universities in all subject areas by 2030. 

We recognize the importance of aligning 
our development goals with national interests. 
Specifically, our research should not be limited 
to agriculture and crop production, but it must 
also focus on the impact of agricultural and 
rural development as well as on the coordinated 
development of food security along with animal 
and human health. 


Q. How do you plan to achieve these 
goals? 

We have initiated several programmes to 
enhance the growth of our researchers, 
including the Zhongshan Scholar Project, which 
supports career development and encourages 
innovation from outstanding scientists 
selected for the programme. In addition, we 
have established a postdoctoral faculty track 
that enables us to identify the most promising 
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Zhou Guanghong, president of Nanjing Agricultural University. 


scholars for faculty positions. To promote 
professional competence, we have adopted 
international standards to recruit and evaluate 
faculty members. 

In addition to attracting top scholars to boost 
the strength of our research, we are reinforcing 
our core subject areas. We are establishing 
novel research platforms and building two new 
campuses to ensure we will have adequate space 
and research facilities as we grow. 

We recognize that international cooperation 
must play a key role as we evolve into a world- 
class university. NAU maintains ties with 
over 160 universities and research institutes 
around the world, and has established 10 
international research centres. We participate 
in many international collaborative research 
programmes and are continuously augmenting 
academic exchanges with world-leading 
institutions. We are now working on setting up 
more international exchange programmes for 
graduate students. 
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The soul of a university lies 
in its academic spirit, but 

a university also has the 
broader mission to serve 
all of society. 
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Q. What are NAU’s strengths? 

NAU is located in Nanjing, one of China’s great 
ancient capitals and the modern and vibrant 
capital city of Jiangsu Province. The university 
shares a proud historical tradition with the city 
as it was the first Chinese university to offer 
four-year bachelor degrees in the agricultural 
sciences. Today, as a national key university 
under the Chinese Ministry of Education, we 
balance that heritage with modern expertise 
and leadership. In the area of agricultural 
sciences, NAU currently ranks in the top 0.1 
percent globally according to Thomson Reuters 
Essential Science Indicators, while we are ranked 
in the top 1 per cent in the areas of plant and 
animal science, environment and ecology, 
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and biology and biochemistry. In our core 
competencies, notable results from research at 
NAU in crop genetics and breeding, crop growth 
modelling, quality control and processing of 
agricultural products, pest control, veterinary 
medicine, utilization of agricultural waste, and 
bioorganic fertilizers have been published in 
prominent international journals. 


Q. How can academic research and 
student education contribute to each 
other? 

Academic research and student training 
certainly go hand-in-hand and should 

develop concurrently. NAU encourages its 
undergraduate students to gain hands-on 
research skills by working directly on research 
projects. Our well-established research 
platforms and experienced faculty at NAU also 
give students the opportunity to see leading- 
edge research firsthand, giving them a better 
understanding of how their research fields are 
advancing. Of course, faculty members also 
benefit from student participation and their 
frequently innovative input in research projects. 
Currently, 75 per cent of the papers published by 
NAU researchers have graduate students as the 
first author. Recently, one of our PhD students 
was first author of a paper on jasmonate 
signalling published in Nature. 


Q. How does NAU contribute to regional 
and national development? 

The soul of a university lies in its academic spirit, 
but a university also has the broader mission to 
serve all of society. This is what we are striving 
for at NAU. To give an example, a team led by 
Wan Jianmin researching rice breeding and pest 
resistance successfully cloned several important 
genes in rice. The team’s research was published 
in Nature and other respected international 
journals, but more importantly, it helped to 
control the spread of rice stripe virus in southern 
China. Another example is NAU’s research 
centre for new rural development, which was 
founded in line with national priorities for 
promoting development in rural areas. To foster 
this type of consequential research, we are now 
emphasizing a more multidisciplinary approach 
and stressing social service when evaluating 
researchers. 
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Nanjing Agricultural University 


An Agricultural Research 
Pioneer in China 


As one of the Ministry of Education’s ‘211 
project’ universities, NAU is dedicated to 
fundamental and applied research in the 
field of agriculture. 


selection of the NAU-led research on 
Ac national and global issues 
includes: 

Crop Science 
Gai Junyi, an academician at the 
Chinese Academy of Engineering, is dedi- 
cated to breeding new soybean varieties 
and increasing their productivity and qual- 
ity. Wan Jianmin focuses on breeding rice 
varieties with disease and pest resistance, 
the results were published in Nature, Nature 
Biotechnology and Nature Genetics. Zhang 
Tianzhen, together with Chen Z. Jeffery 
at the University of Texas, sequenced the 
whole upland cotton genome (Nature 
Biotechnology, 2015). Cao Weixing and Zhu 
Yan’s research on crop information tech- 
nology covers crop growth modelling and 
general knowledge models for crop man- 
agement. Ma Zhengqiang focuses on the 
identification of powdery mildew and scab 
esistance genes in wheat. Chen Fadi and 
Hou Xilin study the breeding of chrysant 
mums and non-heading Chinese cabbage, 
espectively, and developed many n 
varieties. Zhang Shaoling’s research uncov- 
ered the mechanism of self-incompatibility 
in pears. 

Wu Yidong tested the ‘natural refuge strat- 
egy for delaying insect resistance to trans- 
genic cotton crops (Nature Biotechnology, 
2014). Wang Yuanchao and Zheng Xiaobo 


o 


identified new genes in signal transduc- 
ion and the gene regulation network of 
plant-oomycete interaction. Han Zhaojun 
carried out pest resistance target research 
and established supporting technology, ef 
fective in controlling cotton bollworm and 
Chilo suppressalis. 
Animal Health 
Lu Chengping has focused on microbiol- 
ogy research associated with livestock. In 
2013, Lu’s lab was approved by the World 
Organization for Animal Health as the 
world’s only reference lab for the diagnosis 
of swine streptococcosis. Jiang Ping devel- 
oped two vaccines for swine to effectively 
control infection with PRRSV and PCV2 
viruses. Zhu Weiyun also developed pro- 
biotics for swine and enzyme preparations 
for chickens. 
Food Safety 
Zhou Guanhong and Xu Xinglian uncovered 
the mechanisms underlying the forma- 
tion of volatile compounds in traditiona 
hinese cured meat products. They have 
so established systemic meat grading and 
quality control standards. Zhou Yingheng 
focuses on the marketing and circulation o 
agricultural products, especially relating to 
issues of food quality and safety risk control. 
Environmental Sciences 
Shen Qirong works on technology for the 
production of bio-organic fertilizers from 
solid waste material such as straw and 
animal manure. Zhao Fangjie found that 
microorganisms in rice paddy soil are able 
to oxidize and volatilize heavy metals. Pan 
Genxing focuses on the production and 
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application of Bio-char from recycled straw. 
Xu Guohua studies the molecular biology 
of crop nutrients and water efficient use. 
Zhang Wenhua and Jiang Mingyi have 
many achievements in understanding the 
mechanisms of plant resistance to abiotic 
stresses. 

Agricultural Economics and Social 
Science 

Zhong Funing and Zhu Jing focus on the 
heory and policy of national food security 
in the context of globalization. Qu Futian 
has introduced innovations in systems 
of land property rights and regional eco- 
economic evaluation theory. Wang Siming 
has been undertaking research on China’s 
agricultural civilization and the Chinese phi- 
osophy of science and technology. 


Focused on recruiting and retaining high- 
calibre faculty and building world-class 
campus facilities, NAU is committed to 
becoming a leading centre of agricultural 
education and research. 

Nanjing Agricultural University Human 
Resources: http://rsrcw.njau.edu.cn/html/ 
en/html/Imy/1.html 


Contact 

Tel: 86-25-8439-5754 
Fax: 86-25-8443-2420 
Website: www.njau.edu.cn 
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Shenzhen University 


A rising star in south China 


Shenzhen University (SZU) is a compre- 
hensive research university with state- 
of-the-art facilities, high-calibre faculty 
members and a highly professional ad- 
ministration body. Together with the 
city of Shenzhen — China’s most suc- 
cessful Special Economic Zone — the 
university has been undergoing rapid 
growth since its foundation in 1983. 


n 2015, SZU received 600 million CNY 

for research grants and won 205 project 

grants from the National Natural Science 
Foundation of China (NSFC). Thomson 
Reuters’ Essential Science Indicators ranks 
SZU as among the top 1 per cent of institu- 
tions in the world for the field of engineer- 
ing. In 2014, 869 research papers from SZU 
were published in Science Citation Index 
journals, including 8 in Nature and its sister 
journals. Some of SZU’s recent research is 
highlighted below. 
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Progress in optoelectronics and 
photonics 

Various groups at SZU are conducting 
groundbreaking research in the areas of 
optoelectronics and photonics. A group 
ed by Hanben Niu, an academician of the 
Chinese Academy of Engineering (CAE), 
made significant contributions to the 
development of multimode and super- 
esolution optical imaging. The group 
developed new form of photodynamic 
herapy and a non-z-scanning multimo 
ecule fluorescence tracking system with 
nanometre resolution. These imaging 


methods have been applied for immu- 
nological tracking,  three-dimensiona 
DNA imaging and disease diagnosis and 
herapy and have shed light on life science 
and clinical research. 


A team at the Nanophotonics Research 
Centre (NRC) led by Xiaocong Yuan has 
determined how to manipulate arbitrary 
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new advanced materials. Fan’s research 


focused plasmonic fields to achieve 
polarization-controlled directional 


cou- 
plasmon polaritons. The 
proposed and verified 
that can trap 
lic particles or nanowires. 
succeeded in manipulat- 


achieving high communi- 
Their research is crucial 
ment of next-generation 
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Another team, le 
reassembled iso 


d by Wenjing Zhang, has 
ated atomically layered 


two-dimensiona 


heterostructures, 


materials into hybrid 


creating new _ artificial 


systems with rich functionalities and nove 
optical properties. 

A team led by Yiping Wang, recipi- 
ent of the National Science Fund fo 
Distinguished Young Scholars of China 
devoted to the design and fabrication o 
sensing devices in optical fibres to develop 
all-optical micro total analysis systems. 
Aiming to create an optical fibre that can 


act as all-in-one lab, or a ‘lab-in-fibre’, the 
researchers are working on microfabrica- 
tion technology, in-fibre microstructures 
and novel functional materials. 


Researching engineering and 
materials 

A team led by Qingquan Li, president of 
Shenzhen University, and Renzhong Guo, 
a CAE academician, is researching mul- 
tisource geoinformation acquisition and 
services. The team has pioneered new 
techniques to dynamically acquire spatial 
data and apply them to geoenvironmen- 
tal monitoring. The researchers are also 
exploring new methods for large-scale 
vehicle and individual trajectory data 


analysis as well as data mining of social 
networks. The team won second prize for 
the National Technology Invention Award 
of China for their outstanding work on 
ment. 
Guoliang Chen, a Chinese Academy 
of Science (CAS) academician, and his 
eam at the Guangdong Province Key 
Laboratory of Popular High Performance 
Computers (PHPCs) have been striving to 
build high-reliability, low-cost and easy 
o use PHPCs. They have built KD- and 
SD-series PHPCs based on the China- 
made Loongson CPUs. The team has also 
designed a parallel computing framewo 
hat consists of universal representation, 
partitioning and parallel computing of big 
data, simultaneously addressing the cha 
enges of volume, velocity and large variety 
of big data. 
A team supervised by Feng Xing devel- 
oped a service-life design theory of marine 
structures, involving studies on the failure 
mechanism for material and structure 
and the development of novel materials. 


road checking and measure 


eam won second prize in the State 
Technological Innovation Awards as well 
as two ministerial and provincial-level 
awards of China. 


Number of NSFC grants 


Budgets for science and technol- 
ogy research (in 10,000 Yuan) 


Number of SCI papers 


A group led by Florian Stadler is inves- 
tigating soft-matter physics and chemistry 
with a special focus on the rheology of 
multistimuli polymers and polymer gels. 
Research on zwitterionic polymer solu- 
tions unveiled the influence of ions and 
zwitterion content on their properties. 
Polymer blends of dendrimers and poly- 
styrene were found to challenge the foun- 
dations of several theories on the phase 
structures of immiscible polymers. 


Exploring medicine and life science 
The functions and mechanisms of sele- 


nium and icariin in resisting Alzheimer’s 
disease have been uncovered by a research 
team led by Jiazuan Ni, an academician of 
CAS. The researchers have also made re- 
markable progress in screening biomarkers 
for early diagnosis of Alzheimer’s disease. 
A group led by Hong Li is exploring 
how emotional stimuli affect executive 
cognitive processing. The group have es- 
tablished that the changes in functional 
connectivity dynamics are 


associated 
with vigilance network, shedding light on 
the relationship between the dynamics of 
functional brain networks and individual 
behaviours by distinct cortical processes. 
Yuejia Luo and his team are explor- 
ing a broad area of social cognitive 
neuroscience. They combine brain im- 
aging methods, autonomic measures 
and behavioural observation meth- 
ods to investigate mood disorders 
and understand their neural mecha- 
nisms. The research has great clinical 
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implications for mental disorders and 
brain diseases. 
Functional connectivity among critical 
language regions in the human brain is 
being investigated by a team led by Lihai 
Tan. They have found cross-language 
differences in the brain networks serving 
speech and reading, lending support to the 
culture-specific theory of cortical organiza- 
tion of language. 

Xiongzhong Ruan and his group are 
ocusing on lipid-mediated chronic kidney 


diseases. They have identified a ‘wiring di- 
agram’ of lipid trafficking unde 
tory stress, demonstrating the mechanism 
by which inflammatory stress modifies 
ipid homeostasis. Their study updates the 
conventional understanding of the patho- 
genesis of lipid- mediated tissue injury and 
has important clinical implications. 
A research team led by Deming Gou has 
developed a simple, sensitive and specific 
method for detecting circulating miRNAs, 


inflamma- 


providing a promising tool for clinically 
diagnosing diseases based on miRNA bio- 
markers. They have also identified a group 
of miRNAs associated with pulmonary 
arterial hypertension. 


Recruitment of talented researchers 
SZU is seeking talented researchers and 
warmly welcomes outstanding scholars 
from around the world. We offer excellent 
compensation packages with large start- 
up funds and a free intellectual environ- 
ment. SZU strives to be an open, globally 
recognized leading university. 


SHENZHEN UNIVERSITY 
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Frontier Institute of Science and 
Technology 


An international and 
multi-disciplinary 
research environment 


Frontier Institute of Science and 
Technology (FIST) was established by 
Xi’an Jiaotong University in an effort to 
create a world-class, multi-disciplinary 
research institute. FIST is the first "special 
academic zone" at the university. It has 
introduced an international, scientific 
research management system that seeks 
to reform scientific research in China. 


Since its establishment, FIST has 
set up 11 multi-disciplinary research 
centres, which cover physics, chemistry, 
biology (including the life sciences and 
basic medicine), materials science, 
mathematics, computational science, 
engineering and other subjects. 


FIST aims to drive rapid innovation by 
becoming a hub for talented researchers 
from all over the world. So far, 44 scholars 
have joined FIST, over 40 percent of 
whom are either academicians or have 
been awarded national titles. Over the 
past five years, FIST has grown into a 
nationally and internationally renowned 
research institute that has made significant 
scientific contributions. 


Contact 

Tel/Fax: +86 29 83395131 
E-mail: fist@xjtu.edu.cn 
Website: fist.xjtu.edu.cn 


Address: 1 West Building, 99 Yanxiang 
Road, Yanta District, Xi’an, 

Shaanxi Province, P. R. China, 

710054 
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That’s why | chose FIST! 


Yanzhen Zheng 
(1000 Young Talent program scholar; principal investigator at the Center 
for Applied Chemical Research) 


After completing Marie Curie Fellowship in 2011, | decided to return to 
China. FIST immediately drew my attention because, despite being in its 
infancy, it offered a unique international and inter-disciplinary environment. 
| believe that this is an ideal model for modern universities and research 
institutes. Thanks to the great support from Xi’an Jiaotong University over 
the last four years, both FIST and my research group have grown rapidly. 
As the first explorers of a new system, we are very proud of the great 
success of the inter-disciplinary research advocated by FIST. Based in 
the historical city of Xi'an - the first stop on the former Silk Road - FIST is 
expected to take a leading role in bringing a new wave of creativity and 
innovation to China. 


Xiaojie Lou 
(1000 Young Talent program scholar; principal investigator at the Multi- 
disciplinary Materials Research Center) 


The 21st century has witnessed China’s rapid economic and social growth. 
As one of only a few inter-disciplinary research institutes in China, FIST 
provides me with a better environment and more opportunities for doing 
cutting-edge research than other universities. | enjoy the freedom to 
pursue my areas of interest and the open-minded atmosphere at FIST. 


Guanghao Lu 
(1000 Young Talent program scholar; principal investigator at the Multi- 
disciplinary Materials Research Center) 


FIST is a paradise for academic research. | chose to join FIST because | 
believe that inter-disciplinary collaboration makes research easier. At FIST, 
we get to interact with researchers from different disciplines, with different 
research experience and backgrounds. The laboratories at FIST are well 
equipped for studying materials science, chemistry, physics and biology. 
Well known as the eastern terminal of the former Silk Road, the city of 
Xi'an is located in the Weihe River valley in the heart of China. Xi'an was 
the capital of China for more than 1,000 years and is home for many 
famous historical sites, such as the Terracotta Army, City Wall and Wild 
Goose Pagoda. The beautiful Qinling mountains are only 15 kilometres 
away. 


Pengfei Li 
(principal investigator at Center for Organic Chemistry) 


FIST provides me with the freedom to conduct research in whatever 
subjects | am interested in. It offers very good working conditions, as 
well as an excellent atmosphere for inter-disciplinary collaboration. The 
relatively simplified bureaucratic procedures and the support services 
provided by the administrative staff allowed me to rapidly build my labs 
and now releases me from the burden of paperwork so that | can spend 
more time on my research. Furthermore, the unparalleled historical and 
cultural richness and rapid modernization of Xi’an city makes a very good 
mix for enriching my life here. 
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